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Abstract 



Much of the literature on optimal design of experiments has focussed on experiments where 
the behaviour of the system is approximated by a linear model, such as a low-order poly- 
nomial. In many areas such as pharmacology and chemistry, such approximations are not 
appropriate, as the underlying mechanisms produce highly nonlinear or categorical responses. 
This thesis addresses some issues with the optimal design of experiments in these situations. 

Commonly used criteria for the 'optimal' design of experiments relate to optimality in 
terms of efficient estimation of model parameters. However, quite often another important 
objective of an experiment is to select the model structure which best describes the under- 
lying behaviour of the system. We examine existing criteria for model discrimination for 
both nonlinear and generalised linear models, and combine them with criteria for parameter 
estimation in order to create designs which address both objectives. We show that these 
designs can be quite efficient in terms of each of the criteria. 

Further complications in the design process arise with the use of mixed effects models, 
that is when some model parameters are allowed to vary randomly between blocks or clusters 
of units. The Fisher information matrix is involved in the calculation of many optimality 
criteria, and this matrix cannot be written down in closed form for many nonlinear and 
generalised linear mixed effects models. We instead rely on approximations to the informa- 
tion matrix to generate optimal designs. This thesis gives details of the use of an existing 
approximation to the matrix in the optimal design of a complex pharmacokinetic experiment 
involving nonlinear mixed effects models. We also investigate several alternative approxima- 
tions to the matrix for logistic regression with random coefficients, with an application in 
pharmacodynamics: the design of a cross-over trial with a binary response. 



vi Abstract 

Regardless of the criterion used to select a particular design, we require a method to 
search the design space for points which maximise or minimise the criterion. The models 
considered in this thesis are assumed to have predictor variables taken over a continuous 
range, so combinatorial optimisation techniques such as the tabu algorithm are not appro- 
priate. Instead we make use of the simulated annealing algorithm (modified for continuous 
variables) and a relatively new algorithm known as the cross-entropy method. Both algo- 
rithms are implemented in programs written for the MATLAB package. 
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Chapter 1 



Introduction 



The increasing use of optimisation techniques and modern powerful computers has lead to 
the use of more complex and realistic models in data modelling. However, judging by a 
review of the literature in the optimal design of experiments (Silvey, 1980; Atkinson and 
Donev, 1992; Pukelsheim, 1993), the majority of research in this area remains focussed on 
linear models, while in fact nonlinear and generalised linear models (GLMs) are commonly 
used in many areas such as chemistry and pharmacology. Atkinson and Donev (1992), for 
example, gives an excellent introduction to the theory of optimal design of experiments, but 
only devotes one chapter to nonlinear models, and a section of the 'Further Topics' chapter 
on generalised linear models. 

Optimal design has been studied since as early as 1918, when the paper by Smith (1918) 
defined the objective of minimising the worst-case prediction error in the construction of 
polynomial models. In this paper, she introduced what is now known as G-optimality (which 
minimises the maximum over the design space of the standardised variance, as defined by 
Kiefer and Wolfowitz (1959)). In the same paper by Kiefer and Wolfowitz, they gave the 
name of D-optimality to the criterion introduced by Wald (1943), which puts the emphasis 
on the quality of the parameter estimates. A D-optimal design maximises the determinant 
of the Fisher information matrix, hence minimising the volume of the confidence ellipsoid 
of any unbiased parameter estimates. Kiefer and Wolfowitz (1959) also relate these two 
alphabetic criteria by the General Equivalence theorem. 
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All of the papers mentioned so far only consider models which are linear in their para- 
meters, i.e. models of the form 

y = r](X,0) + e = X0 + e, (1.1) 

for explanatory variables X , model parameters 0, and zero-mean residual error terms e (these 
models are introduced more formally in the next chapter). Certain complications arise in 
the optimal design of experiments when the function r\ is not linear in the parameters, the 
most notable of which is that nonlinear models require estimates of the model parameters 
to be known for an optimal design to be constructed. Typically initial parameter estimates 
are based on the results of previous studies or 'expert' guesses. Optimal designs found using 
point estimates are said to be locally optimal. 

Although work on the theory of optimal design for linear models dates back as far as 
1918, nonlinear models did not appear in the optimal design literature until 1959. Box 
and Lucas (1959) investigated locally D-optimal designs for nonlinear models, in which the 
authors justified the dependence of such designs on prior knowledge of parameter values 
by stating that "in practical problems it will almost invariably be the case that some such 
information is available, and this will then provide the basis of a first design" . The literature 
on optimal design for nonlinear models has been relatively sparse due to this dependence on 
parameter values. (One approach to avoiding the need for point estimates in optimal design 
is the use of Bayesian experimental designs, in which prior distributions are placed on the 
model parameters. These Bayesian approaches are outside the scope of this thesis, but are 
reviewed in Chaloner and Verdinelli (1995) and Clyde (2001).) 

In the area of pharmacokinetics (PK, the study of the time course of the changing con- 
centration of a drug in the body), for example, where nonlinear models are commonly used, 
most work on optimal design has been theoretical, with little in the way of application. The 
literature in the area begins with D'Argenio (1981), who discussed the optimal choice of 
sampling times in a PK study, but confined the optimisation to a single subject. 

The increasing use of mixed effects models (such as random coefficient models and models 
with random block effects) has led to a growing interest in optimal design for such models. 
Optimal design for linear mixed effects models has been addressed by several papers (Cheng, 



1995; Atkins and Cheng, 1999; Berger and Tan, 2004). Mentre et al. (1997) propose an 
approach to optimal design for random-effects regression models in which nonlinear models 
are hrst linearised. This paper by Mentre et al. has led to an increasing use of optimal design 
in population PK studies (i.e. where the models involve between subject variability). Two 
prospective evaluations of an optimally designed PK study have been published (Mentre 
et al., 2001; Green and Duffull, 2003), with the third (Waterhouse et al., 2005) recently 
submitted for publication, also appearing as a chapter of this thesis. 

There have been several papers on optimal design for efficient estimation of parameters 
in generalised linear models (Chaloner and Larntz, 1989; Atkinson et al., 1995; Atkinson 
and Haines, 1996), but the literature in this area tends to be even more sparse than in 
the area of nonlinear models, due to the increased theoretical and computational challenges 
associated with GLMs. GLMs are commonly used in experiments in areas such as chemistry, 
pharmacology and engineering (see Myers et al. (2002) for examples of applications), so 
optimal design for these models warrants further consideration. 

Longford (1994) is concerned with estimation in logistic regression models with random 
coefficients, and presents an approximation to the Fisher information matrix which may 
be useful in optimal design. However, there appears to be very little, if any, literature on 
the topic of optimal design for generalised linear mixed models. This thesis addresses some 
formidable problems associated with the construction of optimal designs for these models. 

The majority of work on optimal design for nonlinear and generalised linear models has 
focussed on parameter estimation. Another common goal of an experiment is to choose be- 
tween two or more competing model structures. Atkinson and Fedorov (1975a) introduced 
the T-optimality criterion for discriminating between two nonlinear models, in which one of 
the two models is assumed to be true (with known parameter values) and the noncentrality 
parameter for lack of ht of the false model is maximised. This was generalised to accommo- 
date for more than two competing models in Atkinson and Fedorov (1975b), and analogous 
criteria were defined for generalised linear models in Ponce de Leon and Atkinson (1992). 

This thesis addresses a number of issues related to optimal design for nonlinear and 
generalised linear models. After introducing some notation and basic concepts of optimal 
design (including the search methods used in this thesis) in Chapter 2, the thesis is divided 
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into two parts: Part I relates to nonlinear models, while Part II is concerned with generalised 
linear models. 

Chapter 3, the first chapter in Part I, describes a number of methods for finding designs 
which are near-optimal in terms of both parameter estimation and model discrimination. 
Several examples are given to demonstrate their effectiveness. Chapter 4 describes the 
construction of an optimal design involving nonlinear mixed effects models with multiple 
responses, where the objective of the design is to discriminate between two competing models 
as well as estimate the parameters well under each model. The work is motivated by a real- 
life pharmacokinetic study. 

Part II starts with Chapter 5, an analogy to Chapter 3 for generalised linear models, in 
which the aim is to construct a design which is useful in terms of both parameter estimation 
and model discrimination. The effectiveness of the designs in terms of model discrimination is 
evaluated by power tests. Chapter 6 is motivated by another example in pharmacometrics, a 
pharmacodynamic study of a drug with a binary response. A range of locally optimal parallel 
(single period) and crossover designs are constructed for a logistic regression model, and the 
sensitivities of the designs to misspecification of the parameter values are examined. A union 
of parallel and crossover designs is also considered, and the loss of efficiency resulting from 
allocating more patients to the parallel group is investigated. The last chapter in Part II, 
Chapter 7, outlines a number of approaches to optimal design for generalised linear mixed 
models. The model used in the example of Chapter 6 is extended to allow the parameters 
to vary randomly between patients and the corresponding optimal designs are compared to 
the designs for the fixed effects model in the previous chapter. 

The thesis ends with some concluding remarks and discussion of future work in the area 
in Chapter 8. 



Chapter 2 



Preliminaries 



This chapter introduces the concept of an experimental design, the types of models that are 
used in this thesis, as well as some criteria used to optimise the designs for these models, for 
a number of objectives. Finally, the search algorithms used to perform these optimisations 
are described. 



2.1 Approximate and exact experimental designs 

An experimental design £ with n support points can be written as 





(2.1) 

where $ >i G X are the support points (m-vectors also known as elementary designs) consisting 
of the explanatory variables which describe the experimental conditions, with weights wi G 
[0, 1] summing to 1. The design space can be written S = {£ G X n x [0, l] n : YH=i w i = !}■ 
For an exact design with N experimental units, the weights are given explicitly by Wi = ri/N, 
where r, is the number of replications of ^ i . For an approximate (or 'continuous') design, 
the weights also represent the proportion of experimental effort at each point, but the design 
is not constrained to apply to a hxed number of experimental units. Methods for choosing 
this design in an 'optimal' way are described below. 
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2.2 Models 

2.2.1 Linear models 

In the context of this thesis, linear models are considered to be models which are linear in 
their parameters, that is they are of the form 

y = n(X,0) + e = X0 + e, (2.2) 

for the n x p matrix X = (x[, . . . , x^)' (the elements of Xi are functions of the support 
points £J and the px 1 vector of parameters 0, where the elements of e are the residual 
error terms, usually assumed to be independently normally distributed with zero mean and 
constant variance a 2 , and '" denotes the vector/matrix transpose. A common example of a 
linear model is a polynomial such as the quadratic with one explanatory variable: 

rj(x,0) = 6 + 6 x x + 6 2 x 2 . 

These types of models have been covered extensively in the optimal design literature, and 
are not the focus of this thesis. 

2.2.2 Nonlinear models 

The use of nonlinear models gives rise to some interesting problems in optimal design. The 
models are considered to be of the form 

y = n(X,0) + e, (2.3) 

for the nxp matrix X and the p x 1 vector of parameters 0, where the elements of e are again 
typically assumed to be independently normally distributed with zero mean and constant 
variance a 2 . This is a generalisation of the model in Equation (2.2), where the function n is 
no longer constrained to be linear in the model parameters, that is it cannot be expressed as 
X0. One example of a nonlinear model is the three-parameter quadratic Michaelis-Menten 
model with a single explanatory variable: 

9\x 



n(x,0) 



62 + x + 6 3 x 2 ' 
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2.2.3 Generalised linear models 

Generalised linear models (GLMs) encompass a large class of models, from simple linear 
regression models to models for quantal responses to models for survival data. They can be 
studied as a single class, and are all denned by three characteristics: 

1. The distribution of the n x 1 vector of independent responses, Y = (Yi, . . . , Y n )', with 
means E(Y^) = ^ and variances Var(F i ) = a((/))V (iTi) , where a ((f)) is a scale factor 
which doesn't depend on 7Tj. We write n = (iri, . . . , 7r„)'. 

2. The linear predictor 

77= (r] U ...,r] n )' = XO, 

where X is the n x p matrix of known functions of the m explanatory variables, and 
= (6i, ... , Op)' is the p x 1 vector of model parameters. 

3. The link function g, providing the link between the mean vector n and the linear 
predictor 77: 

g(^i) = Vi- 

The GLMs considered in this thesis are generally for binomial data, where Si successes are 
observed from ra; trials (i.e. Si ~ Bin(mi,7Tj)), and the proportion of successes is written as 
Yi = Si/rrii. The expected value of Yi is then the probability of success 7Tj. Some common link 
functions for binomial data are the logit link function, g(iTi) = logit(7Tj) = log{7Tj/(l — vr^)}; 
the probit link function, g(iri) = $ _1 (7ri), where $ is the cumulative distribution function 
of the standard normal distribution; and the complementary log-log link function, g(^i) = 

l0g{-l0g(l-7T;)}. 

It is also worth defining here the deviance of the fit of a generalised linear model for 
binomial data: 

D(Y;<k) = 2£(Y;Y)-2e(ir;Y) (2.4) 

n 



n 

«'; 



Y % log(F l / 7 r l ) + (1 - Y^ log ( l —^ 

1 - -Ki 
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where 7r = (tti, . . . , 7r re )' is the vector of fitted values of n from the data Y, and l is the 
log-likelihood. For observed values y = (yi, ■ ■ ■ ,y n )' of the random variables in Y, the 
log-likelihood can be written 

n r / 

7T« 



^(tt;?/) = X] Vilog ( 



1 — 7T,' 



+ mi l0g(l - TTi) 



2.3 Design criteria 



2.3.1 Fisher information matrix 

For a model involving the parameter vector and an experimental design £ as described in 
Equation (2.1), the expected Fisher information matrix is defined as 

dH(0; Y) 



M(0,O = E 



SflSfl' 



where £(0; Y) is the log-likelihood of the N x 1 vector of observations Y = (Yi, . . . , Iat)'. 

For linear models, the information matrix does not depend on the parameters 0, and is 
given by 

M(£) = X'WX, (2.5) 

where W = diag(«;i, . . . , w n ). 

For models which are nonlinear in their parameters, the information matrix now depends 
on 0, and is calculated by 

M(0,i) = F'WF, (2.6) 

where F is the n x p matrix of first partial derivatives: 



d9 



dr](x„,9) 

de 



The information matrix for GLMs for binary data also depends on the model parameters, 
and is given by 

M(0,i) = X'WX (2.7) 

where W is now a function of more than just the design weights, 



W = diag 



W; 



f dTTj 



7T;(1 - TTi) \dr)i 
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The dependence of the information matrix on the parameter values is via the W ma- 
trix, whose elements are functions of 7^, which are in turn functions of 0. For exam- 
ple, the information matrix for a logistic model for binomial data involves the matrix 
W = diag(?i>;7ri(l — 7Tj)), where 7T; = logit" (x-0). 

2.3.2 Parameter estimation 

D-optimality 

The Cramer-Rao Lower Bound states that the variance-covariance matrix of any unbiased 
estimator of is bounded below by M~ 1 (0, £). It follows that any population design which 
maximises \M(0, £)| (where \A\ denotes the determinant of the matrix A) minimises the 
volume of the joint asymptotic confidence ellipsoid for the parameters. Such a design is 
called D-optimal. 

The ability of a design to efficiently estimate the parameters of a model can be addressed 
by its efficiency compared to the D-optimal design, i.e. its D-efficiency, 

where (;* D is the D-optimal design and p is the number of parameters in the model. More 
generally, we can compare two designs £1 and £2 using the relative efficiency of one compared 
to the other: 



\M(e,( 2 

Other criteria 

A large number of alphabetic optimality criteria exist for the creation of locally optimal 
designs for the estimation of parameters, such as: A, where the sum of the variances of the 
parameter estimates is minimised (by minimising the trace of the inverse of the information 
matrix); c, in which the variance of a linear combination of the parameters is minimised; 
G, which minimises the maximum over the design space of the standardised variance (the 
variance of the predicted response scaled for sample size and residual error variance); and 
D s , where the interest is in estimating a subset s of the parameters as precisely as possible. 
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See Atkinson and Donev (1992), for example, for a review of these criteria and more. 
This thesis will be mainly concerned with the use of T>-optimality for parameter estimation. 

2.3.3 Model discrimination 

T-optimality 

Suppose that the response can be described by the nonlinear model in Equation (2.3), where 
the true function rj is either of two candidate functions r]i(x, 0{) and r] 2 (x, 2 ) with x G X 
and r G Q r . The construction of T-optimal designs, as described by Atkinson and Fedorov 
(1975a), depends on the assumption that one of these models is true, and that its parameters 
values are known. If the hrst model is arbitrarily chosen as true (and therefore 0\ is assumed 
to be known), and we write 

Vt(x) = 77i(x,0i), 

then the T-optimal design can be defined, using notation similar to that of Ucihski and 
Bogacka (2002), as 

£* T = argmax i min J(£, 2 ) i , (2.8) 



where 



J(£, 2 ) = $>i||ifc&) - nit* 2 )\\\ (2.9) 



»=i 
where || • || is the Euclidean norm and Wi are the design weights. Write 

Of = arg min J(C T ,0 2 ). 
O2&&2 

As noted by Atkinson and Fedorov (1975a), there are certain pairs of models (eg. nested 
models) for which it is meaningless to design these types of experiments when parameter 
values are unrestricted. In such cases it is necessary to place further constraints on 2 . 

Ponce de Leon and Atkinson (1992) defined the analogous T-optimality criterion for 
GLMs. Assume again that we have two candidate models for binary data, this time defined 
by their linear predictors rj 1 and r) 2 , and their link functions g\ and g 2 , with the first 
model assumed to be true and its parameter values known. A T-optimal design in this 
case maximises the deviance arising from the fit of model 2 when the data are the expected 
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responses of model 1. That is, the T-optimality criterion is the deviance Z)(77i; 7V2), where 
D(-) is the function defined in Equation 2.4. 

2.4 Search methods 

The criteria outlined in Section 2.3 are used in a search of the design space in order to obtain 
the optimal experimental design. These criteria often form a surface over the design region 
with many local extrema. It is therefore important to use a global search method which 
performs a thorough search of the design space to avoid 'getting stuck' in local optima. 
Two such methods which have proved to be useful for optimal design in the context of this 
thesis are presented here. As the problems in this thesis are concerned with optimisation of 
continuous variables (such as sampling times and dose levels) rather than discrete variables, 
the continuous version of each algorithm is described here (both methods have evolved from 
combinatorial optimisation methods). 

2.4.1 Simulated annealing 

A search routine employing simulated annealing (SA) for continuous variables can be used 
to maximise the various design criteria. The implementation used in this thesis is based on 
the algorithm of Corana et al. (1987). Also see Goffe et al. (1994) for examples of its use. A 
brief overview of the procedure is given here. 

Consider the design £ in Equation (2.1) as an (m + 1) x n matrix with elements £ij. 
In a constrained optimisation (the only type considered in this thesis), each element of this 
design matrix has a corresponding lower and upper bound. Write the matrices defining these 
bounds as L and U. Since the (m + l)th row corresponds to the design weights, we always 
have £( m+ i),j = and U( m +i),j = 1 for j = 1, . . . , ra. 

At the kth iteration of the algorithm, each element of £ (i.e. each of the support points 
and the weights) is perturbed in sequence to give £*-. Thus each iteration involves finding 
(m + l)n new designs. The size of the perturbations are defined by a matrix V = {vij} of 



step sizes. The value of £* • is taken from a uniform distribution on the interval £* - l ± v 



i,j- 
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The Vij are adjusted at specified intervals so that, on average, around half of the new designs 
are rejected according to the rule described below. 

For each new design, its criterion C(£ fc ) is calculated and compared to the criterion for the 
current 'best' design £*. Designs with higher criteria are always accepted, designs with lower 
criteria are accepted or rejected according to the Metropolis criterion, i.e. the acceptance 
probability is exp[(C(£ fe ) — C(£*))/Tfc] for the current temperature Tf.. The temperature is 
lowered at specified intervals by geometric cooling, Tfc +1 = aT^ (a = 0.9 for these examples). 
The initial temperature is calculated after an initial run (of usually 100 or more iterations), 
with To usually chosen such that the initial probability of accepting 'inferior' designs is 
close to 95%. Designs containing points outside the ranges defined by L and U are always 
rejected. 

The algorithm stops when the average step size has reached a suitable value, i.e. the 
stopping criterion is ^2 i j v i,j/[( m + l) n ] < tol, for a pre-specified small tolerance, tol. This 
differs slightly from the algorithm of Corana et al. (1987), who instead rely on the criterion 
reaching a stable value, regardless of the step sizes. Sufficiently small step sizes ensure that 
the criterion does in fact reach a stable value. 

MATLAB code is given in Appendix A.l for the algorithm as used in design optimisation 
problems in this thesis. 

2.4.2 Cross-entropy 

The cross-entropy (CE) method (Rubinstein and Kroese, 2004) was originally developed as 
an algorithm for estimating probabilities of rare events in stochastic networks. It was then 
adapted to, among other applications, combinatorial optimisation and, more recently, to 
continuous multi-extremal optimisation (Kroese et al., 2005). 

Again consider the design £ as an (m+1) x n matrix with elements dj, with corresponding 
lower and upper bounds given by the matrices L and U . To apply the CE method to our 
optimal design problems more easily, we reshape each of these three matrices to (m + l)nxl 
vectors. We associate the elements of £ with independent truncated normal distributions, 
with lower and upper limits defined by the elements of L and U . The means and variances 
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of these distributions are given by /x = (pi, . . . , fJ.( m +i)n)' an d cr 2 = (a 2 , . . . , cr 2 m+1 ^)'. The 
truncated normal distribution for each element of £ is then written as £.,- ~ N t (pj, <r|, Zj, LT,-), 
for j = 1, ... , (m + l)n. We are interested in finding the point fi* which relates to the optimal 
design £*. 

We begin by choosing initial estimates of fi and cr, written fi Q and <Xo- Usually fi is 
chosen as the centre of the rectangular design region, L + ( U — L)/2, and cr is ( U — 
L)/A. For the kth. iteration of the algorithm, we draw a random sample £ 1 , . . . , £ N from the 
Nt(/x fe _ 1 , o - fe _ 1 , i, C7) distribution. Let 2" be the indices of the iV ellte (= piV, typically p = 0.1) 
best performing samples, as chosen using the optimality criterion. For j = 1, . . . , (m + l)n, 
we let 

^• = E€-/^ elite 

and 

~2 Y^/^i ~ \2 /Arelite 



^■ = 2^(€--^-)7iv € 

Finally, the parameters are smoothed by the following: 

Afc = «lAfc + (1 " 0<l)£fc-l, O-fc = «20"fe + (1 - «2)^_ 1 , 

where «i and «2 are the fixed smoothing parameters, with typically a\ = 0.9 and a-i = 0.3. 
The algorithm converges when maxj(ffij) < tol for some fixed tol (tol = 0.01 is used in the 
problems in this thesis). 

Additionally, we use the 'injection' method described in Botev and Kroese (2004), in 
which every time the stopping criterion is met, the standard deviations are inflated by 
adding 

\ci-cu\h 

for some h between 0.1 and 10, where CI is the best value of the criterion obtained at 
iteration k. The whole process is then repeated a number of times. This is used to avoid 
settling into inferior local optima. 

MATLAB code is given in Appendix A. 2 for the algorithm as used in design optimisation 
problems in this thesis. 
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2.4.3 Comparison of algorithms 

A direct comparison of the two algorithms described above, in terms of their run times and 
ability to hnd the global optimum, is difficult, considering the need for quite a large number 
of parameters to be controlled for each algorithm. In each case we could set these parameters 
so that the algorithm runs for a long time, to be confident of finding a global optimum. For 
example, in the simulated annealing algorithm we could set the number of iterations between 
step size changes to be very large, and the stopping criterion 'tol' to be very small. This 
may virtually guarantee that the global optimum is found, but it is possible that a much 
shorter search will produce the same design. 

As a typical example of optimisation problems tackled in this thesis, a comparison of 
the two methods was carried out by finding D-optimal designs using the logistic regression 
model described in Section 6.4.2. The default values of the algorithm parameters (as defined 
in the code in Appendix A) were used in both cases, except that no 'injections' were used 
in the cross-entropy method. Both algorithms converged to the same design (as given in 
Section 6.4.2), but SA was slightly slower in this case, converging in 144 seconds, as opposed 
to 138 seconds for CE. 

It is encouraging that the two methods converged to the same design, but we cannot make 
strong judgements about the computation time. The slower algorithm in this case, SA, was 
run again with the numbers of cycles between changes in step size and temperature reduced 
from the default values of 20 and 10 to 10 and 5, respectively. The algorithm converged to 
the same solution, but this time in 79 seconds, considerably faster than the first run, even 
considerably faster than the first run of the cross-entropy method. However, we may be able 
to adjust the parameters of the CE algorithm to obtain the same result in even faster time, 
and so on. 

In most of the problems in this thesis, rather than aiming for the algorithm which has the 
greatest speed, we are instead more concerned with finding the global optimum. Simulated 
annealing seems to consistently find the global optimum designs for these default values of 
the algorithm's parameters, regardless of the models under consideration. Reasonable values 
of the parameters for the cross-entropy method, on the other hand, seem to be much more 
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parameter dependent. In light of this, simulated annealing is used for most problems in this 
thesis, with CE used as a 'backup' to check for global optimality when practical. 
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Part I 



Nonlinear models 
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This part of the thesis investigates methods of optimal design for nonlinear models when 
the objective is to both effectively discriminate between two competing models and efficiently 
estimate the parameters of both models. A number of methods involving multiple criteria are 
introduced, and their use is demonstrated in a number of examples. An example of a real-life 
optimal design is also given where the objective is both discrimination and estimation. The 
models are extremely complex, involving random effects, multiple responses, and systems of 
ordinary differential equations with no analytic solution. 
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Chapter 3 

Designs for discrimination and 
estimation in nonlinear models 



Research into the optimal design of experiments has in the main concentrated on optimisa- 
tion with respect to parameter estimation. An experimental design is 'optimised', that is the 
best choice of runs, experimental units, etc. is found through the use of an optimality crite- 
rion. The most commonly used criterion is D-optimality, described in Section 2.3.2, which 
minimises the determinant of the variance-covariance matrix of the parameter estimates. In 
comparison, relatively little research has been carried out into criteria and techniques for 
discriminating between competing models for an experimental design. Atkinson and Fe- 
dorov (1975a, b) introduced T-optimality as a criterion for discriminating between models. 
T-optimality has not been widely used, due in part to the computational burden involved in 
the optimisation for even relatively simple models, but also due to the theoretical challenges 
it presents (see Ucihski and Bogacka (2002)). 

It is unclear as to whether T-optimal designs are good D-optimal designs and vice versa. 
It would be very useful and advantageous to be able to have designs that perform well 
for both model discrimination and model parameter estimation simultaneously. Compound 
criteria, such as the product of D-optimality criteria (Atkinson and Cox, 1974) have been 
suggested, and this is followed up on here. Other uses of composite or compromise criteria for 
optimal design are described in Stigler (1971), Studden (1982) and Cook and Wong (1994). 

21 
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In this chapter, criteria and methods are introduced which are aimed at achieving designs 
that efficiently discriminate between models and yield efficient parameters estimates. Prod- 
uct, conditional and hybrid optimality criteria are developed and applied to both linear and 
nonlinear models. The resulting designs are evaluated in terms of both model discrimination 
and parameter estimation. 

3.1 Optimality criteria for two competing models 

Compound or composite criteria, such as those suggested by Atkinson and Cox (1974), are 
usually aimed at combining two (say) design properties in which we are interested. As 
mentioned above it is not known whether a design can be simultaneously T- and D-optimal. 
The product optimality criterion based on the work of Atkinson and Cox is defined in this 
section, along with several alternative criteria which are aimed at achieving designs with 
some degree of effective model discrimination and efficient parameter estimation. Each of 
the proposed criteria use either or both T- or D-optimality objective functions as defined in 
Section 2.3. 

In this chapter we only consider approximate designs, i.e. designs of the form 




as described in Section 2.1. Optimal designs presented here are written 

r = 




where x* = (£ l5 . . . , £ n ) and w* = {w\, . . . , w n ). 

Suppose that the rth of our two competing models for the observations can be given by 

Vi = Vr(£i, &r) + £i (i = 1, ■ ■ ■ , n; r = 1, 2), 

with 6i ~ N(0, a 2 ), and write Y as the nxl vector of responses and 6 r = (6 r \, . . . , 9 rnr )' ■ 
As we are considering T-optimal designs, among others, in this chapter, we require that one 
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of the two models is assumed to be true. Suppose that the hrst model is true (and therefore 
assume that 0i is known), and write 

3.1.1 T-optimality and T-efficiency 

Recall from Section 2.3.3 that the T-optimal design is defined as 

£* T = argmax < min J(£, 2 ) \ , (3.1) 



where 

n 

j{t, 2 ) = $>ilM€i) - %(&, o 2 )\\ 2 , (3.2) 

where || • || is the Euclidean norm and Wi are the design weights. We have also defined 
0f = argmin e2G 2 J(£* T , 2 ). 

A comparison of sequentially constructed designs to the T-optimal design is given for a 
pair of linear models in Atkinson and Donev (1992, p. 235). The 'efficiency' of a design £ in 
terms of model discrimination used here is the ratio 

J& gg) 

j(t* T ,0?y 

However, the choice of 2 in the numerator presents a problem which is not addressed in 
Atkinson and Donev (1992). The T-optimal design gives the greatest separation between 
the response of the two models for the 'worst-case scenario' of 2 , so it makes sense to assess 
the new design £ in a similar manner, i.e. let 2 = 2 = argmhi0 2G e 2 J(£, #2)- On the other 
hand, the competing .D-optimal designs for nonlinear models rely on the prior specification 
of 2 = 0* 2 , the same value used in the T-optimal design. One may argue that in this case 
it is fair to judge the design using the same assumption of parameter values as was used to 
find the design. 

In this light, both methods of T-efficiencies, termed T e a ff and T e fe ff , are included in this 
chapter: 

rpa (t\ _ ^(£' gg) rpb (t\ _ ^(£' ^2 ) (o o\ 

eff(0 " j(e T , 0?y eff(0 " j(e r , o?Y { } 

where 2 = argmin 02e e 2 J(£, 2 ). 
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3.1.2 Product optimality 

If the objective of our experiment is not model discrimination, but efficient estimation of both 
sets of parameters of two competing models, then we may wish to maximise the product of 
the two D-optimality criteria, scaled for the number of parameters, as suggested by Atkinson 
and Cox (1974). The concept and definition of D-optimal designs were introduced in Section 
2.3.2. The notation is now extended to account for the inclusion of multiple models in the 
design process. 

The product optimal design for the two models is defined as 

r P = argmax|M 1 (0 1 ,Or M |M 2 (0 2 ,Or /p2 , (3.4) 

where M r (0 r ,^) is the information matrix for the rth model and r is the vector of the p r 
parameters in the model. Here equal importance is placed on each model. The exponents 
1/pi and l/p 2 may be adjusted if we wish to estimate the parameters of one model more 
efficiently than the other. For example, if we were twice as interested in estimating the 
parameters in model 1 as model 2 (or if we had twice as much confidence in model 1 as 
model 2), we might use the exponents 2/pi and l/p 2 in Equation 3.4 instead. 

If the models in question are nonlinear in their parameters, then the information matrix 
depends on the parameter vectors, as described in Section 2.3.1. Hence, to find product 
optimal designs the values of 0\ and 2 must first be specified. The designs described in 
the following sections combine this product criterion with the T-optimality criterion. For 
T-optimal designs, it is already assumed that the value of 0\ is known, so the same 0\ is 
used for the construction of these product optimal designs. The value of 02, however, is 
unknown, and we let it take the value obtained in finding the T-optimal design, i.e. we let 

f7 2 t/ 2 . 

The ability of a design to estimate the parameters of the rth model can be addressed by 
its efficiency compared to the D-optimal design, i.e. its D-efficiency, 

DU0 C) ( mi °- 01 Y Pr r-12 
where £* £>r is the D-optimal design, which maximises \M r (0 r ,^)\. 
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3.1.3 Conditional optimality 

Since the construction of T-optimal designs does not take parameter estimation into account, 
it could be expected that the D-efhciencies of the designs may be quite poor in practice, a 
problem which is investigated in the examples which follow. In order to hnd designs which 
are useful for both parameter estimation and model discrimination, methods are proposed 
involving either the extension of the T-optimal design to include additional support points 
so that the overall design maximises the product criterion in Equation (3.4); or conversely 
the extension of the product optimal design to include points so that the overall design 
maximises the T-optimality criterion. 

Suppose that we have derived a T-optimal design, 

\wf wf ... <J \w* T 

We wish to hnd m further support points which aid in parameter estimation to add to the ex- 
isting design. Of course, the weights wf- ', . . . , w^ will need to be adjusted in order to 'make 
room' for the additional points (all weights must sum to 1). To do this, simply multiply each 
weight by a factor (1 — a), where a G (0, 1) represents the importance the experimenter 
places on parameter estimation. The construction of the 'conditional' optimal design (con- 
ditional on the fixed T-optimal design) then involves finding the additional support points 
tZ+i, ■■■, tZ+ m and weights w*Z+i, • • • , w*Z+ m so that the complete design 



-*T C *T c*P\T c*P\T 



*P\T ) ^1 ' ' ' **n T ?rj T + l ' ' ' ZriT+m 



/i \ *T /i \ *T *P\T *P\T 

(l-a)w 1 ... {l-a)w nT w nT ' +1 ... w nT \ m ^ 
maximises the product criterion from Equation (3.4), and 

pi-«x+.E «'.* P|T = L 

Denote this design by {„ to indicate that the T-optimal design is found hrst, followed by 
additional 'product optimal' points (with a weighting of a on product optimality), and the 
design shall be referred to as T|T-optimal. 

As already noted, for non-linear models these information matrices will depend on the 
parameter values 0i and 2 . In other words, the conditional optimal design will be locally 
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optimal. As in the product optimal design, let 0\ take the value already assumed to be 
known, and for the second model, let the parameters be dictated by the T-optimal design: 
let 2 = 9f. 

The obvious reversal of this process is hx the points of the product optimal design 



t 



*p 



£*P £*P £*P j [ *p 

SI S2 • • • Sn P I I X 



;7 Wo ■ ■ ■ w„ i w 



with weights multiplied by a factor a (again representing the importance placed on parameter 
estimation), and to choose m additional support points such that the complete design 



p*p p*P £*T\P £*T\P 

,. ... j if J ' ' ' p Srjp + l • • • >>n P +r 

' *P „ *P *T|P *T|P 



aw, ... aw' w„ \_i ... w„ ,„. 

1 np np-\-i np-\-m 

maximises the T-optimality criterion from Equation (2.8), where 

np np-\-m 

E*p i v^ * T \ p 1 
aw i + Z^ w i = 1 - 

i=l i=np-\-l 

Denote this design by £ Q to indicate that the product optimal design is found first, followed 
by additional 'T-optimal' points (with a weighting of a on product optimality), and we shall 
refer to the design as T|P-optimal. 

3.1.4 Hybrid designs 

A more straightforward approach to constructing designs which address both problems of 
discrimination and estimation is to simply include design points from both the T-optimal 
design and the product optimal design in such a way that the weight from each point is 
adjusted to account for the priority given to the criterion which that point represents. For 
the np-point product optimal design £* p and the np-point T-optimal design £* T , and for an 
a G (0, 1) again representing the importance placed on parameter estimation, such a design 
is given by 



i 



£*P £*P £*T £*T 

hybrid ____ ) ^1 " " " ^ n P ^1 ^ n T 

aw* p . . . aw*^ p (1 - a)w* T ... (1 - a)w^ 
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Note that, by definition, 

rip nrp 



5]»; p + Hi- Q K T = i. 



»=i »=i 



This approach has the advantage that it does not rely on further optimisation once the 
T-optimal and product optimal designs are known. 



3.2 Construction of designs and examples 

3.2.1 Algorithms 

The examples that follow involve the numerical optimisation of the criteria presented in 
Section 3.1. The algorithm used in each case is an implementation of simulated annealing 
for continuous variables, as described in Section 2.4.1. 

In terms of the T-optimal designs, a special note must be made of the minimisation of 
the function in Equation (3.2): at each iteration of the annealing algorithm, this was imple- 
mented using MATLAB's Optimisation Toolbox. The functions fminsearch and lsqnonlin 
may both be used for this purpose, fminsearch minimises any scalar function of several 
variables, whereas lsqnonlin is designed for minimisations of the form 

i 

mmf(x) = y2f t (x) 2 + L 
»=1 

where L is a constant. Depending on the model structure and the initial estimate of 6 2 , 
for some designs fminsearch found the best 2 out of the two functions (i.e. it gave a 
better fit for for model 2), but for other designs lsqnonlin was better. Thus the criterion 
calculated may be larger than the true value of the criterion, which results in the algorithm 
converging to less than optimal designs. Due to this inconsistency it was necessary to use 
both functions to perform the minimisation, after which the smallest minimum was selected. 
This heuristic approach appears to find a stable minimum and therefore avoids the problem 
with convergence. 
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3.2.2 Linear example 

In this first example, the results of Example 20.3 of Atkinson and Donev (1992) are repro- 
duced, and then product, conditional and hybrid optimal designs are found. The two models 
which we wish to discriminate between are 

771(2;, 0i) = flu + 9 12 e x + 9 13 e~ x (-1 < x < 1, -00 < B ld < 00), 

and 

r] 2 (x, 2 ) = 2 i + 9 22 x + 9 23 x 2 (-1 < x < 1, -00 < 9 2j < 00). 

The first model is assumed to be true for the construction of the T-optimal design, with 
parameter values On = 4.5, 9\ 2 = —1.5 and 9\ 3 = —2, as in Atkinson and Donev (1992). 
The T-optimal design (identical to the one found by Atkinson and Donev), product optimal 
design, conditional designs (with either the T-optimal or product optimal design first, for a 
range of a), hybrid designs (also for a range of a), and their D- and T-efficiencies are given 
in Table 3.1. 

The support points of the T\P optimal designs seem to be identical to those of the 
hybrid design, whereas the T|T-optimal design is slightly different. It should be noted that 
in terms of T e 6 ff , the conditional designs and hybrid designs are all very similar. The P\T- 
optimal designs are worse than the T| T-optimal and hybrid designs in terms of T e fe ff , but 
more T-efficient. 

The efficiencies for a range of conditional and hybrid optimal designs, including the 
product design (a = 1) and the T-optimal design (a = 0) are shown graphically in Figure 
3.1. The three methods all produce very similar results. There is an obvious increasing 
trend in T-efficiencies of all designs as we place more importance on parameter estimation 
(i.e. as a decreases) and an almost linear increase in both T-efficiencies as we favour model 
discrimination (i.e. as a increases). A reasonable trade-off between the two objectives seems 
to be at around a = 0.2 for the T|T-optimal design, and at around a = 0.35 for the other 
optimal designs. 
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:*T 



Design, £* 



ar 



T-optimal 
-1 -0.6693 0.1439 0.95691 
0.2527 0.4281 0.2469 0.0723J 

D-optimal, model 1 

cDl __ r-i o i 



t 



e 



1/3 1/3 1/3 
D-optimal, model 2 

*D 2= f-l 1 

\l/3 1/3 1/3 

D-optimal, product 
p_j-1 1 

1/3 1/3 1/3 



Efficiency 



n 1 n 2 T" T ft 

^eit flgg ^eff ^eff 



0.6069 0.6179 1 



0.9082 



0.9082 



0.9082 



Sa=0.25 — 

^*P|T _ 
Sa=0.5 — 

c*P\T _ 

Sa=0.75 — 



P| T-optimal 

x* T -1 0.0682 1 

0.75™* T 0.0126 0.0178 0.2196 

x* T -1 0.0448 1 

0.5w* T 0.1172 0.1209 0.2619 

x* T -1 0.0218 1 

0.25w* T 0.2244 0.2261 0.2996 



0.8340 0.8407 0.8193 0.9885 



0.8963 0.9012 0.6928 0.9740 



0.9511 0.9537 0.4234 0.9479 



p *T\P _ 

Sa=0.25 — 

p*T\P _ 
Sa=0.5 — 

p *T|P _ 
?a=0.75 — 



T|P-optimal 
-1 -0.6693 0.1438 0.9570 
0.25™* p 0.1896 0.3211 0.1852 0.0542 



x 



*p 



X 



*p 



-1 -0.6693 0.1438 0.95701 

0.5w* p 0.1264 0.2141 0.1234 0.0361J 

x* p -1 -0.6693 0.1437 0.9570 

0.75™* p 0.0632 0.1070 0.0617 0.0181 



0.7648 0.7721 0.9058 0.9770 



0.8691 0.8740 0.7134 0.9541 



0.9448 0.9473 0.4213 0.9311 



^hybrid 
£0=0.25 



Hybrid 
,*P 



,*T 



X~* X 

0.25™* p 0.75w* T 



^hybrid J *^ 

^ a=0 - 5 ~ \0.5w* p 0.5w* T 
x* p x* T 

0.75w* p 0.25w* T 



^hybrid I %" -'C 

Sa=0.75 — 



0.7648 0.7721 0.9058 0.9770 



0.8691 0.8740 0.7134 0.9541 



0.9448 0.9473 0.4213 0.9311 



Table 3.1: Efficiencies of near-optimal designs for two competing linear models. 
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0.2 0.4 0.6 o.e 

{<- discrimination) a (estimation ->) 




0.2 0.4 0.6 

{<- discrimination) a (estimation - 




0.2 0.4 0.6 O.E 

(*- discrimination) a (estimation ->) 



FIGURE 3.1: Efficiencies of designs for two competing linear models: (a) P|T-optimal designs; 
(b) T|P-optimal designs; (c) hybrid designs. In each case, a represents the importance placed on 
parameter estimation. 
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3.2.3 Nonlinear example: two models for decay 

Now consider the problem of finding an optimal design to discriminate between two models 
for decay as described in Example 20.1 of Atkinson and Donev (1992), namely the exponential 
decay model 

rji(x, 6i) = exp(—6ix) (x, 6i > 0), 

and the inverse polynomial 

V2(x,e 2 ) = —— (x,e 2 >o). 
i + o 2 x 

Again let the hrst model be true, with 9\ = 1, as in Atkinson and Donev (1992). The same 
process as the previous example was followed to hnd the T-optimal design, the product 
optimal design and the conditional and hybrid optimal designs. The results are summarised 
in the same manner in Table 3.2 and Figure 3.2. (The T-optimal design is again confirmed 
by the results given for this example in Atkinson and Donev (1992).) 

As for the designs for the linear models, the T| T-optimal and hybrid designs appear to 
be identical, whereas the P\ T-optimal designs differ slightly. 

Trends similar to those seen in the previous example are shown here, with values of a 
around 0.5 giving similar efficiencies for parameter estimation and model discrimination in 
each case. In this case, however, there seems to be a bigger trade-off between the T-optimal 
and product optimal designs than the trade-off in the linear models. That is, as a increases 
the T-efficiencies increase and the T-efficiencies decrease at a greater rate than for the linear 
models. 

3.2.4 Nonlinear example: two models for rise and decay 

We now consider a pair of slightly more complicated models, used in the case study of Section 

3.13 of Bates and Watts (1988). The hrst is the three-parameter quadratic Michaelis-Menten 

type model 

9 T 

m(x, e x ) = - - x n + 0isx2 (x > 0, -00 < e l3 < 00), 

and the second is the exponential difference model 

V2 (x, 2 ) = 6 21 (e~ 023X - e- 02 * x ) (x > 0, -00 < 2j < 00). 
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Design, £* = 



x 



: *T 



T-optimal 
0.327 3.34 

0.3345 0.6655 



Efficiency 

-^eff ^eff ^eff ^eff 



0.2065 0.4472 1 



D-optimal, model 1 
'1' 



t*D 



e ui = 



D-optimal, model 2 



1 0.8224 0.0403 



p Di = fO.5324 1 


0.7221 


1 





0.7326 


.D-optimal, product 










^P = J0.79951 


0.9545 


0.9212 





0.2393 


P T-optimal 










*p\t f x* T 0.8501 
^ a=0 - 25 ~~ \o.75w* T 0.25 J 


0.3986 


0.5598 


0.7787 


0.7932 


*p\t f x* T 0.8251 
* Q=0 - 5 ~ [0.5w* T 0.5 J 


0.5861 


0.6783 


0.5397 


0.6022 


*p\t f £* T 0.8101 
^ Q=0 - 75 ~ \o.25w* T 0.75 J 


0.7710 


0.7992 


0.2791 


0.4187 


T P-optimal 










*t|p f z* p 0.327 3.34 1 
^ Q=0 - 25 ~ \o.25tu* p 0.2509 0.4991 J 


0.3935 


0.5657 


0.7893 


0.8098 


*t|p f x* p 0.327 3.34 1 
^ Q=0 - 5 ~ \0.5u>* p 0.1672 0.3328] 


0.5805 


0.6842 


0.5461 


0.6196 


*t|p f z* p 0.327 3.34 1 
^ Q=0 - 75 ~ \o.75u;* p 0.0836 0.1664] 


0.7675 


0.8027 


0.2809 


0.4295 


Hybrid 










^.hybrid J "^ ^ I 

^=0.25 - jo.25™* p 0.75™* T ] 


0.3935 


0.5657 


0.7893 


0.8098 


^.hybrid J "^ ^ I 

?a=0.5 ~ | .5u>* p 0.5™* p ] 


0.5805 


0.6842 


0.5461 


0.6196 


^■hybrid J "^ ^ I 

^= - 75 " \0.75™* p 0.25™* p ] 


0.7675 


0.8027 


0.2809 


0.4295 



Table 3.2: Efficiencies of near-optimal designs for two competing models for decay. 
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FIGURE 3.2: Efficiencies of designs for two modeis for decay: (a) P|T-optimai designs; (b) T\P- 
optimal designs; (c) hybrid designs. In each case, a represents the importance placed on parameter 
estimation. 
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Figure 3.3: Two candidate models for rise and decay. 



These can be seen as an extension of the models of the previous example, as they involve an 
initial rise in response, followed by a decay. Figure 3.3 shows the similarity of the models, 
despite their structural disparity. 

Again, to hnd the T-optimal design, assume that the first model is true, and arbitrarily 
select #ii = #i2 = #13 = 1. The resulting T-optimal design is given in Table 3.3, along with 
the product design and a selection of conditional and hybrid optimal designs. 

Comparing the conditional and hybrid designs, the T| T-optimal designs appear yet again 
to be identical to the hybrid designs. The T|T-optimal designs again differ very slightly to 
the hybrid designs. It is interesting to note that in the a = 0.25 case, where little emphasis is 
placed on parameter estimation, the T|T-optimal design contains only two additional points, 
whereas for higher values of a, three additional support points are needed to maximise the 
product criterion. 

It can be seen again that the T-optimal design has a poor T-efficiency for each model 
and the .D-optimal product design performs quite well in terms of parameter estimation and 
model discrimination when comparing T e fe ff (although not in terms of T e ^). 

Figure 3.4 shows the decreasing trend in T-efficiencies and the increase in T-efficiencies 
as more importance is placed on model discrimination (i.e. as a increases). Again, this 
trade-off between designs is more pronounced for these nonlinear models than it is for the 
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linear models. 

3.3 Discussion 

We have seen that although T-optimal designs can be inefficient for parameter estimation, 
and conversely D-optimal designs may be unsuitable for model discrimination, we are able 
to construct designs using compound criteria which offer a suitable compromise between the 
two objectives. Through the conditional and hybrid designs described here, the trade-off 
between estimation and discrimination can be simply specified by the choice of a. 

T e a ff and T e 6 ff always agree with respect to the direction of change. T e a ff , however, always 
seems to show the greatest degree of change for a given change in a. Regardless of the 
choice of T-efficiency, the results are the same in terms of model discrimination: the lower 
the value of a (i.e. the more emphasis that is placed on model discrimination), the larger 
the T-efficiency of the conditional and hybrid designs. 

The difference between the two types of conditional designs and the hybrid designs is 
minor. Indeed, for all three examples presented here, the conditional designs with the product 
optimal design points fixed are essentially equal to the corresponding hybrid designs. The 
use of hybrid designs is hence preferable, as they involve no further optimisation once the 
T-optimal and product optimal designs are known, and can be generated very quickly. 

These results show that T>-optimal designs (including product designs) with few support 
points offer little value in terms of discrimination. In these cases the additional support 
points in the conditional and hybrid designs give a dramatic improvement in T-efficiencies. 
It would appear that for models whose D-optimal designs have several support points (where 
the number of parameters in each model is larger), a product optimal design is efficient for 
both model discrimination and parameter estimation. Certainly more than a single support 
point would be required for effective model discrimination, regardless of the number of 
parameters in the models. Of course, a loss in parsimony (due to an increase in the number 
of distinct support points) may incur an additional cost in many practical situations. We 
may prefer to penalise the criteria presented here for less parsimonious designs. 

The methods described in this chapter have been exemplified by the use of several sets 
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Design, £,* = 



x 



w 



t*T 



C 1 = 



T-optimal 
0.1730 1.131 5.104 



28.C 



0.0187 0.1216 0.2163 0.6434 



Efficiency 

-^eff -^eff ^Tff ^eff 



0.2428 0.2333 1 



D-optimal, model 1 
Dl _ f 0.2560 1 3.9061 
* \ 1/3 1/3 1/3 J 



D-optimal, model 2 



;*D 



£*L>2 — 



0.2453 1.272 8.932 



;*P 



1/3 1/3 1/3 

.D-optimal, product 
0.2578 1.143 6.412 
1/3 1/3 1/3 



1 0.7581 0.8162 



0.8202 1 



0.6565 



0.9279 0.9512 0.8574 



c*P\T 
?q=0.25 

c*P\T _ 
Sa=0.5 — 



P| T-optimal 
x* T 0.2594 1.128 
0.75u>* T 0.1683 0.0817J 
x* T 0.2590 1.141 6.910 1 
0.5w* T 0.2289 0.1710 0.1001J 



*p|T _ f x* T 0.2584 1.143 6.578 
^ Q=0 - 75 ~ \0.25to* T 0.2814 0.2525 0.2161 



0.5541 0.5114 0.7824 0.9603 



0.6770 0.6632 0.5254 0.9139 



0.8014 0.8088 0.2634 0.8831 



c*T\P _ 
Sa=0.25 — 

c*T\P _ 
S Q =0.5 — 

c*T\P _ 
?a=0.75 — 



T|P-optimal 



,*P 



x* 0.1730 1.131 5.104 28.08 

0.25™* p 0.0140 0.0912 0.1622 0.4826 

x* p 0.1730 1.131 5.104 28.08 V 

0.5to* p 0.0093 0.0608 0.1081 0.3217J 



a; 



*P 



0.1730 1.131 5.104 28. C 



0.75u;* p 0.0047 0.0304 0.0540 0.1609 



0.4946 0.4746 0.7821 0.9644 



0.6597 0.6465 0.5251 0.9287 



0.8005 0.8025 0.2635 0.8931 



^hybrid _ 
Sq=0.25 " 

^hybrid 

Sa=0.5 — 

^hybrid 
Sa=0.75 " 



Hybrid 
,*P 



,*T 



X X 

0.25w* p 0.75w* T 



,*p 



,*T 



X X 

0.5w* p 0.5w* T 



,*p 



,*T 



X X 

0.75tu* p 0.25™* T 



0.4946 0.4746 0.7821 0.9644 



0.6597 0.6465 0.5251 0.9287 



0.8005 0.8025 0.2635 0.8931 



Table 3.3: Efficiencies of near-optimal designs for two competing models for rise and decay. 
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FIGURE 3.4: Efficiencies of designs for two models for rise and decay: (a) P|T-optimal designs; 
(b) T|P-optimal designs; (c) hybrid designs. In each case, a represents the importance placed on 
parameter estimation. 
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of fairly simple nonlinear models. The next chapter puts some of this work to use in the 
context of a considerably more complex pair of models incorporating multiple responses, 
random parameters, and ordinary differential equations with no analytic solution. 



Chapter 4 

Optimal design for nonlinear mixed 
effects models with multiple responses 



The methods of optimal design for nonlinear models in the previous chapter will now be 
applied to a real-world example of a pharmacokinetic (PK) study, in which the aim is to 
choose a model which describes the time course of the changing concentration of a drug 
in the body, and to efficiently estimate the parameters in the model. As with many PK 
studies, the models involved are rather complex, involving random effects (to allow the PK 
parameters to vary between subjects) and multiple responses (we are interested in modelling 
the concentration of both the drug and its main metabolite). Optimal design for nonlinear 
mixed effects models has been given some attention in the literature recently, particularly in 
regards to PK experiments (Mentre et al., 1997; Retout and Mentre, 2003; Green and Duffull, 
2003). The use of multiple responses, however, introduces some theoretical challenges and 
also adds computational complexity to an already formidable problem. 

4.1 Introduction 

In this chapter, the problem of optimal design for nonlinear models with multiple responses 
is addressed through a real example in which an experiment is designed for the study of 
the PK of the drug itraconazole. Part of the aim of the study is to discriminate between 
two candidate models, hence some ideas of the previous chapter are employed here. The 
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inclusion of multiple responses in the models motivates the investigation of techniques for 
incorporating all responses into the design process. 

4.1.1 Example: Background and significance 

Allergic bronchopulmonary aspergillosis is a complication of cystic fibrosis (CF), occurring 
in approximately 10% of patients mostly after the age of 6 years (Moss, 2002). A study has 
shown that patients achieved an improvement in the inflammatory response when itracona- 
zole was given in combination with systemic glucocorticosteroids (Stevens et al, 2000). 

Although relatively little is presently known about the pharmacokinetics of itraconazole 
in patients with CF, there is evidence which suggests that it is different to the PK of the drug 
in non-CF patients (Hedman et al., 1988; Jusko et al, 1975; Finkelstein and Hall, 1979). 
It is also known that there is often large variability in the pharmacokinetic parameters (eg. 
absorption rate constant, volume of distribution, and clearance) of drugs administered to CF 
patients (Reed, 1997; Spino, 1991). It has been observed that high doses of itraconazole are 
required in CF patients for the treatment of allergic bronchopulmonary aspergillosis (Skov 
et al, 2002). 

Itraconazole is extensively metabolised by the hepatic cytochrome P450 isoenzymes and 
has at least 30 metabolites, each representing less than l%-5% of the dose. The main 
metabolite is considered to be hydroxyitraconazole, which also exhibits antifungal activity 
against a variety of species (Heykants et al., 1989; Hostetler et al, 1993; Van Cutsem, 1989). 

4.1.2 Structural models 

The pharmacokinetics of itraconazole and its main metabolite has been shown to exhibit the 
behaviour of two linked compartmental models. Figure 4.1 shows the general structure of 
the models, including the rates of mass transfer between each compartment. The amounts 
of itraconazole in the gut, central and peripheral compartments are given by Ai, A 2 and A 3 , 
respectively; the amount of its main metabolite (hydroxyitraconazole) is denoted by A&. The 
elimination from the parent compartment to a compound other than hydroxyitraconazole 
(shown in Figure 4.1) has been described by some as linear (elimination rate F20CL2/V2) 
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FIGURE 4.1: Compartmental model of the pharmacokinetics of itraconazole and hydroxyitra- 
conazole. 



(Koks et al, 2002), and by others as nonlinear (elimination rate F 2 oV ina , x /(K m + A2/V2)) 
(Barone et al, 1993; Van Peer et al., 1989; Heykants et al., 1989; Hardin et al, 1988). 

The differential equations describing the two models (linear and nonlinear) from Fig- 
ure 4.1 are shown in Equations (4.1-4.8). 

Model 1. The linear model: 



cL4i 

~dT 
ch4 2 

dt 



0M3 
dt 

dA 4 
dt 



= -FuKA.-FuKA, 

= F 12 k a A 1 + ^A 3 -^A 2 -F 2A ^A 2 

v 3 \/ 2 \/ 2 

F CL2 A 

V 2 

= F 1A k a A 1 + F 2A —^A 2 -—^A A 



V, 



Va 



(4.1) 

(4.2) 
(4.3) 
(4.4) 



42 Nonlinear mixed effects models with multiple responses 

Model 2. The nonlinear model: 

— - = -F l2 k a A l -F u k a A l (4.5) 

at 

dA * IP u A ■ Q A Q A j? CL ^ A 

—— = F 12 k a A x + —A 3 - —A 2 - F 2i ——A 2 
at V 3 V 2 V 2 

_ F20 Vma ; A 2 (4.6) 

20 K m + A 2 /V 2 { ' 

^ = *A 2 -9-A, (4.7) 

dt V 2 V 3 3 V ' 

— 4 - = F u k a A l + F 2A —^A 2 -—^A A 4.8 

dr V 2 l/ 4 

Note that since the differential equations describe the rates of change of the amount of 

drug in each compartment (rather than the concentration), the last term in Equation (4.6) 

describing the Michaelis-Menten elimination is slightly different to the form commonly used. 

The divisor still involves the concentration (i.e. A 2 /V 2 ), but we multiply by A 2 to give the 

rate of change in the amount. The units of V max and K m are still of the form 'mass/ (volume 

x time)' and 'mass/volume', respectively. In both sets of differential equations, the initial 

conditions are 

Ai(0) = dose, A 2 (0) = A 3 (0) = A 4 (0) = 0. 

4.2 Theory 

A model must be formed to describe the behaviour over time of the amount of the parent drug 
and its metabolite (A 2 and A4), and hence the behaviour of their concentrations (C 2 = A 2 /V 2 
and C 4 = A 4 /V A ). 

It is assumed that the fractions F 12 , F 14 , F 24 and F 20 are known hxed constants (where F t j 
is the fraction of the drug delivered from compartment i to compartment j for j ^0, and F 20 
is the fraction of the drug eliminated from compartment 2) and that their values are available 
from previous experiments. The parameters to be estimated which are common to both 
models are therefore the volumes of distribution (the amount of drug in the compartment 
divided by the concentration in the blood) V 2 , V3 and V4; and the clearances (the volume of 
plasma from which the drug is completely removed per unit time) CL4, CL 24 and Q. The 
linear model has one additional clearance parameter, CL 2 , while the nonlinear model also 
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has parameters V max and K m . Considering now that interest lies in the pharmacokinetics 
of itraconazole in both capsule and solution form, there are actually four models under 
consideration: the linear model for capsules, the linear model for solutions, the nonlinear 
model for capsules and the nonlinear model for solutions. The models for capsules and 
solutions have the same structure, they only differ in terms of their parameter values. Hence 
there are four sets of parameter values: 

/3 1>c = (kl' c , V^, Vi> c , V^ c , CLY, CL\l Q 1 ^, CL 1 /)', (4.9) 

f3 2 ' c = (kl>\V 2 2 >\VtX'\CLl>\CLl':,Q 2 >\V*>: x ,K 2 m c y, (4.10) 

P 1 " = {k?X*X*,Vl* 1 CLl* t CL% t Q 1 * t CLi')' t (4.11) 

P 2 ' s = (kl'%V 2 2 '%V^X' s ,CLl' s ,CLl k s ,Q^,V^,K^)', (4.12) 

where for (3 U the superscripts u and / indicate the structural model u (u = 1 for linear, 
u = 2 for nonlinear) and the form of dose / (/ = c for capsule, / = s for solution). 

For the time being, suppose that only one structural model and only one form of dose is 
under consideration. The superscripts u and / are dropped, and the p- vector of parameters 
is denoted by (3. Let Oi be the vector of parameter values for the i th individual. Then for a 
vector of n^ sampling times ^ = (tn, . . . ,t in .)', suppose that the n^-vector of observations is 
modelled by 

Employing a nonlinear mixed effects model, it is assumed that each of the parameters varies 
randomly between individuals. Let (3 be the vector of hxed effects and let bi be the vector 
of random effects for individual i. So the random effects are written as either Oi = (3 + bi 
(for additive random effects) or Oi = /3exp(6 i ) (for exponential random effects). Thus the 
structural model can be written in an alternative form, T]((3, bi,£i). It is assumed that 
bi r*j N(0, ft), where Q = diag(cji, . . . , uj p ) and is the vector of zeros. It is also assumed 
that, as in Retout and Mentre (2003), the variances of the normally distributed error terms 
Ci (conditional on the random effects bi) have additive and proportional components, i.e. 
Ci\bi ~ N(0, dia,g{a a+cf pr}{6 i, £;)) 2 for some a a and up. The vector of all variance parameters 
is A = {uj\, . . . , uj p , a a, <Tp)' Denote the entire vector of model parameters by \I/ = ((3', X')'. 
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4.2.1 Optimal design 

Model discrimination aside (momentarily), the objective of the experiment is to obtain esti- 
mates of the relevant parameters with the greatest possible efficiency. For this purpose, the 
Fisher information matrix is employed. 

First some slightly different notation is introduced for use in this chapter. For consistency 
with the existing literature on optimal design for PK models, an experimental design (the 
'population design' in the PK literature) with N independent subjects is redefined as S = 
(£i) ■ ■ ■ ) £jv)'> wriere £j is the 'elementary design' (eg. set of sampling times) for individual i. 
The population design is usually limited to Q groups, with N q subjects allocated to the qih 
group: 

€l * 2 '■■ *°1> (4.13) 

^i N 2 ... N Q \ 

with£?=iJV, = iV- 

Recall from Section 2.3.1 that the Fisher information matrix is given by 

where £(&; y) is the log-likelihood of the vector of observations y = (y\, . . . , yjv)' for the 
population parameters \P. This may be expressed as the weighted sum of the information 
matrices for the Q elementary designs: 

Q 

M(*,Z) = J2N q M(*,£ q ). 

Maximising the determinant of M(ty, S) leads to the D-optimal design, as described in 

Section 2.3.2. 

Unfortunately, for nonlinear mixed effects models, the information matrix cannot be 

written down in closed form. For an elementary design £, Mentre et al. (1997) have shown 

that by using a first-order Taylor series expansion of T)(0,£) around the expectation of the 

random effects, an approximation to the Fisher information matrix for an elementary design 

^ is given by 

r A(E,S) C(E,S) 

C'{E,S) B(E,S) 



M(¥,0«^ 



(4.14) 
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where 



(A(E,S)) r 
(B(E,S)) 7 

(C(E,S)\ 



BE' ^dE 

2 - - b tt^, m,n = l,...,p 



= tr 



= tr 



dS -J95 __! 



3A m 3A ra 

m,n = 1, . . . ,p + 2 
95 ^dS " 



d\ m d(3 n 

m = 1, . . . ,p + 2, n = 1, . 



-,P 



(4.15) 



(4.16) 



(4.17) 



Here 22 and 5 represent approximations to the expectation and variance of the observations, 
respectively: 



E(y)^E = r,(/3,0,£) 



Var(y 



5 = 



db> 



O 



36' 



(4.18) 
+ dmg(a A + ap V ((3,0,ti)) 2 (4-19) 



For computational efficiency, it is assumed that S is independent of (3, as in Retout et al. 
(2002), which makes C(E, S) a matrix of zeros. From the simulations carried out in Sec- 
tion 4.4.2, the impact of this assumption on the hnal design is found to be minimal. This is 
a worthwhile assumption, as it will be seen that this is a computationally intensive optimi- 
sation, and any justifiable simplification is welcome. 



4.2.2 Parameter estimation in multiresponse situations 

Consider for a moment that the observations are in fact modelled by a simpler fixed effects 
model with multiple response types. Suppose that the outcomes of an experiment on a single 
subject with r simultaneous responses can be modelled as 



y 



^ = V ia) (0,U 



.(«•) 



(a = l,...,r;i = l,...,n), 



(°)> 



where the = (6\, . . . , 6 P )' is now a p-vector of fixed effects parameters; E(q ) = 0; 



.(°U&h 



,(«)\2 



E(e\ a) e °0 = for i ± j; E((e^) 2 ) = < = a aa ; and E(e^e^) = p ah o a o h = a ab for a + b. So 



(")Abh 
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the covariance matrix of responses for the i sampling time is given by 



S — {&ab}a,b=l,...,r — 



Oil 0"12 • • • °lr 
C21 CT22 - - - CT2r 



°Yl a r1 



a,. 



where cr a6 = a ba . Let £ : = {cr afe } a)6= i v .. )r .. 

Again, denote the elementary design by £ = (t 1; . . . , £„)'. Draper and Hunter (1966) have 
shown that the information matrix for multiple responses is given by 



M (^) = E5> afei7 ^ 



(4.20) 



where 



» 



V(0,*i 



a=l b=l 



,(«>) 



ww* 



a = 1, . . . , r. 



90 90 

The D-optimal design then maximises the determinant of M(0, £) as given in Equation (4.20). 
It follows that if the information matrix in Equation (4.20) is simplified to 

r r 
a=l b=l 

i.e. if it assumed that the a ab are all equal, then this is equivalent to maximising \F'F\, 
where 

r 

F = Y, F <> 
a=l 



80 80 

This will greatly simplify computation, as all that is required for the criterion is to sum the 
r responses in the model, and then calculate the Fisher information matrix as in the single 
response case. While this assumption may not be realistic for this study, there appear to 
be no published values for the covariance of itraconazole and hydroxyitraconazole, so this 
algebraically convenient (and hence computationally simple) simplification is reasonable in 
these circumstances. Again, the extensive simulation studies in Section 4.4.2 demonstrate 
that this assumption has little impact on the hnal design. 
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This idea is extended to the more complex nonlinear mixed effects models with het- 
eroscedastic error. To accomplish this, the Fisher information matrix described in Equa- 
tions (4.14-4.19) is computed, replacing 77 (/3, 0, £) with Yla=i V^Hft, 0, £). In the itraconazole 
example, the only responses of interest are the concentrations in the 2nd and 4th compart- 
ments. Hence the sum is only over a = 2,4. 

4.2.3 Model discrimination 

There are two competing structural models to describe the pharmacokinetics of itracona- 
zole. The hrst involves linear elimination of the parent drug, the second involves a nonlinear 
Michaelis-Menten process. The T-optimality criterion for model discrimination, described 
in Section 2.3.3, could be applicable here. However, T-optimality is quite expensive com- 
putationally (which is a major concern in this complex scenario), and is not efficient for 
parameter estimation (see Section 3.2 for some examples of this). A viable alternative is 
to use the product of the determinants of the information matrices, raised to the power 
of the reciprocal of the number of population parameters (or equivalently, the product of 
efficiencies, as suggested by Atkinson and Cox (1974)). We have also seen from Section 3.2 
suggestions that in some circumstances this method can produce designs which are useful 
for both goals: parameter estimation and model discrimination. This was evaluated by the 
simulations in Section 4.4.2. 

4.2.4 Combining capsule and solution responses 

As mentioned in Section 4.1.2, there are four models under consideration. However, only 
the linear and nonlinear structural models must be discriminated between. The structure 
of the models for capsules and solutions only differ in the values of the parameters, not the 
structure of the models. To do this, a capsule dose is administered, followed by a solution 
dose at a time, determined by the clinical staff, after which it is assumed that there is no 
residual amount of the capsule dose. It was also assumed by the clinical staff that there is 
no effect of the order of the doses. An elementary design £ includes sampling times for the 
capsule (£ c ) and solution (£ s ). The responses of subject i to both doses are combined to 
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give the overall response. So the parameters for capsules and solutions for a given structural 
model must also be combined, i.e. for u = 1 and 2 (linear and nonlinear models, respectively), 
U = ([0 u ' c y, {6 U ' S )')' . In the context of nonlinear mixed effects models with additive random 
effects, we now have the vector of parameters in model u for subject i given by 

o- = mn'Aoryy 

= ((/3 M < C + 6f' c )', (/3 M < S + b^yy (4.21) 

= (3 u +b? (4.22) 

where f3 u = ((/3 M ' C )', ((3 U ' S )J and b? = ((&"' c )', (K' S )J, with the random effects for dose 
form / (/ = c, s) for each subject being normally distributed with zero mean and with 
covariance matrix Q"'^ = diag(u;™ , . . . , w"^). The combined covariance matrix for model u 
is then ft u = diag(fT' c , Q u ' s ). Equations (4.21) and (4.22) may be adjusted for exponential 
random effects by replacing (3 + b t with (3exp(bi) (for any superscripts on (3 and &;). 
The entire vector of population parameters under model u is then 

^u = ((^ r «.cy j (^«.»)')', (4.23) 

with 

W — \{P ) > W l . ■ ■ ■ 5 W p «,/ . °A I °> J ■ 

In this notation, the combined responses to the capsule and solution doses for compart- 
ment a (where a = 2 or 4) can be written 

vi a) ((3 u , &?.&) = (e } (/3 u ' c , ar.sy. ^ a) (^ >s , &r. £)')'■ 

4.2.5 Final criterion 

Hence the overall product criterion to be maximised is given by 

D P (*\* 2 ,S) = \M 1 {^\'E)\ iIpi \M 2 {^\'E)\ iIp \ (4.24) 

where 

Q 

M«(*«, S) = J2 N q M u (* u , £ q ), u = 1, 2. 

<7=1 
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\I/ M is the vector of population parameters for the u th model. For the u th model, the elemen- 
tary information matrix is given by 



M U (* U ,U 



1 



A(E U ,S U ) C(E u ,S l 
C'(E U ,S U ) B(E U ,S Z 



with A(-, •), £?(•, •) and C(-, •) as described in Equations (4.15-4.16), and 



E(y)«E« = ^t/i a) 09-,O,^) 



a=2,4 



Var(y) sa 5 M = 



(»)# 



9E fl=2 , 4 C'(/3 u ,o^ 



db> 



ft z 



(«•), 



96' 



diagL + ^^^/^O,^)) , 

V a=2,4 / 



(4.25) 



(4.26) 



where the sums are over the responses for compartments 2 and 4. 

4.3 Methods 

In this section a description is given of the constraints of the sampling design, procedures 
used in selecting appropriate parameter and constant values, a number of simplifications to 
the models (to fit design constraints), and a brief overview of computational methods used 
to optimise the design criterion. 



4.3.1 Design constraints 

A number of limitations on the experimental design were imposed by the clinical staff for 
reasons of time and expense. A maximum of thirty patients were to be enrolled and studied 
on two occasions. The number of blood samples was also constrained to be four samples per 
occasion. A single 200 mg dose of the drug in capsules was to be given to each patient on 
the first occasion, followed by a single 200 mg dose in solution on the second occasion. 

A time constraint of approximately two months was placed on the design of the experi- 
ment, from exploration of the literature to computation of the final design. 
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4.3.2 Patients 

Thirty adults with CF who are admitted to hospital for treatment of a chest exacerbation 
will receive two doses of itraconazole (one 200 mg dose in capsules and one 200 mg dose as 
an oral solution) for this study. 

The participating patients are divided randomly into three groups, with each group being 
assigned one of three elementary sampling designs. 

Accordingly patients will be required to take two Sporonax® Capsules as one dose 
(200 mg itraconazole), which is a standard adult dose. Four blood samples (0.5 mL) will be 
then drawn by hnger prick according to the optimum sampling times of the design. 

After a wash-out period of at least three days (after the initial dose), the same dose of 
200 mg (20 mL of 100 mg/10 mL solution) itraconazole would be taken as Sporonax® Oral 
Solution. Four blood samples (0.5 mL) will again be drawn by hnger prick according to the 
optimum sampling times of the design. 

4.3.3 Simplification of models 

The design optimisation problem under consideration is quite computationally intensive. 
Any justifiable reduction in the number of parameters to be estimated (which leads to a 
reduction in the dimension of the Fisher information matrix) is greatly beneficial. 

It is a fundamental belief in pharmacology that the different forms of dose will affect 
the input process only. So rather than treat the model parameters for capsule and solution 
doses as being entirely separate, as in Equations (4.9-4.12), it is proposed that there are 
only two sets of parameters to be estimated, the hrst for the linear model and the second 
for the nonlinear model: 

/3 1 = {klVlV^Vl,CL\,CL l 2i ,Q\CL\)', (4.27) 

/3 2 = (kl,V 2 2 ,Vi,VlCLlCLl„Q 2 ,Vl ax ,K 2 J. (4.28) 

The parameters k™ represent the absorption rate for capsules (i.e. they are strictly k™'°). 
The distinction between absorption rates is instead addressed by a multiplicative factor 
Fk a = 0.9452/0.4159 ~ 2.273, the ratio of the absorption rate for solutions and the absorption 
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rate for capsules (not to be estimated). So the response for the solution under model u is 
given by using the absorption rate Fk a k™. The smaller of the two absorption rates is used 
for this design, as it will be the most difficult to estimate. 

In addition to the computational intensity of the problem, the design constraints (outlined 
in Section 4.3.1) mean that not all of the parameters (of which there is still a large number) 
will be estimable. Preliminary investigation of expected standard errors was carried out 
by calculating the information matrix for an empirical 'rich' sampling design, in which 14 
samples are taken from each patient over almost the entire dosing interval (72 hours) with 
sampling times given by (0.01, 0.5, 1, 1.5, 2, 4, 6, 8, 12, 18, 24, 36, 48, 70). The expected 
standard errors are given by the square roots of the diagonal elements of the inverse of the 
information matrix. This investigation showed that under both linear and nonlinear models, 
the parameters CL 2 4, Q and V3 will have very imprecise estimates under the constraints 
of the sampling design. Further to this, K m is not estimable with any precision under the 
nonlinear model. These four parameters are hence treated as hxed, and are not considered 
in the design process. 

To further simplify computation, the residual amounts of drug and metabolite in each 
compartment at the time of the second dose were assumed to be insignificant (i.e. there is 
no carryover effect). This was found to have a negligible effect on the criterion, which was 
calculated for the design mentioned above, both including and ignoring the residual dose. 

4.3.4 Parameter and constant values 

Finding the D-optimal design for any model which is nonlinear in its parameters requires 
prior specification of the model parameters. Such designs are known as 'locally' D-optimal. 
The parameter values (and values of other constants) used in the construction of the 
design were taken from a variety of sources. To find the local optimum design, parameters 
which are common to both linear and nonlinear elimination models (eg. V2, CL 4 ) are given 
the same value for both model types, despite the treatment of the parameters as separate 
entities when constructing the design criterion. For example, V^ and V" 2 2 are not the same 
population parameters, but for convenience the same value is assigned to both parameters 
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in the construction of the optimum design as only one value was available in the literature. 

A recently published population pharmacokinetic study by Koks et al. (2003) gives most 
of the parameter values needed. In particular, it gives the clearance of itraconazole from 
the central compartment and its volume of distribution in the central compartment, and 
therefore the elimination rate constant from the central compartment (k 2 o), and also the 
distribution rate constants (&23, ^32) and the formation rate constant (^24). 

These authors did not report any nonlinear elimination of the drug. But in several other 
pharmacokinetic studies, nonlinear pharmacokinetic behaviour was shown for itraconazole 
(Heykants et al., 1989; Barone et al, 1993; Van de Velde et al, 1996; Van Peer et al, 1989). 
Information about the Michaelis-Menten kinetic parameters (K m and V max ) was consequently 
collected from other sources (Barone et al, 1993). 

Since the aim is to study the relative bioavailability of the oral solution compared to 
the capsule formulation of itraconazole in patients with CF, the different absorption rate 
constants (k^ and k^) must be estimated as well. This was done using 

log. 2 



k a — 



^1/2, absorption 



where £1/2, absorption is estimated by t max /3. The times of peak plasma concentration for 
itraconazole in the oral solution and the capsule formulation from the drug information web 
page of the Food and Drug Administration (FDA) (Janssen Pharmaceutica Products, L.P., 
2002b) were used. 

As no further information on PK parameters was available, the remaining model para- 
meters were estimated by 'best guess'. Table 4.1 shows, along with all previously mentioned 
parameters and constants, the values chosen for the elimination rate constant, the vol- 
ume of distribution, and the clearance of the metabolite (fc 40 , V4, and CL 4 , respectively). 
The best guess values of these parameters were chosen as those that produced a simulated 
concentration-time curve that was closely comparable to profiles provided by the authors of 
other clinical pharmacokinetic studies (Heykants et al., 1989; Barone et al, 1998). Deter- 
ministic simulations were performed using MATLAB (these are shown in Figure 4.2). 

Residual variability was assumed to have both additive and proportional components, 
with <ja = 5 x 10~ 3 mg/L (half of the lowest concentration detectible by the assay) and 
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FIGURE 4.2: Concentration-time profiles of itraconazole (solid line) and hydroxyitraconazole 
(dashed line) after a 200 mg solution dose of itraconazole, with (a) linear elimination of the drug; 
and (b) nonlinear elimination of the drug. The values of the various parameters and constants used 
to produce the plots are given in Table 4.1. 



a P = 0.15. 

Between subject variability was assumed to be lognormal with variance 0.1, i.e. Oi = 
f3exp(bi) with 6; ~ N(0, 0.1 I p ) for all parameters, where I p is the p x p identity matrix. 



4.3.5 Computational methods 

A set of MATLAB routines, POPT® Version 2.0, was used to calculate the optimal design, 
optimising the criterion given in Section 4.2.5. This software allows the user to find D- 
optimal sampling times by specifying the number of blood samples, the upper and lower 
boundaries for the samples, the dose interval for taking blood samples, and the number of 
groups in the study. Duffull et al. (2005) gives further description of the software used. 
POPT® was modified to incorporate the product criterion and numerical solutions to the 
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Table 4.1: Parameter and constant values used to calculated the optimal design. References: 
(A) = Janssen Pharmaceutica Products, L.P. (2002b); (B) = Koks et al. (2003); (C) = 
Barone et al. (1993); (D) = Janssen Pharmaceutica Products, L.P. (2002a); (E) = best 
guess derived from MATLAB simulation. 

Variable Value Source 

k s a = absorption rate constant, solution 
k c a = absorption rate constant, capsule 
Fk a = absorption rate multiplicative factor 
k 2 o = elimination rate constant from the 

central compartment, linear model 
k23 = distribution rate constant 
ks2 = distribution rate constant 
&; 2 4 = formation rate constant 
k^o = elimination rate constant for hydrox- 
yitraconazole 
CL2 = clearance of itraconazole 
CLi = clearance of the metabolite 

Q = inter-compartmental clearance 
Knax = theoretical maximum rate of the 
process 
K m = Michaelis-Menten constant 
V2 = volume of central compartment 
V4 = volume of metabolite compartment 
V3 = volume in peripheral compartment 
CL24 = clearance of itraconazole by 
metabolism to hydroxyitracona- 
zole 
F12 = fraction of parent in the gut absolute 

bioavailability 
F 14 = fraction of metabolite after first pass 

metabolism 
F24 = fraction of parent converted to 

metabolite 
F 2 o = fraction of parent eliminated 
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differential equations. POPT® employs a version of simulated annealing for continuous 
variables similar to the one described in Section 2.4.1. 

Numerical solutions to differential equations 

In the case of the nonlinear model, there does not exist a closed form solution to Equa- 
tions (4.5-4.8), so the responses of interest, A 2 and A^ cannot be isolated analytically. 
Instead numerical solutions to the equations are used. For this purpose, MATLAB's built-in 
ODE solver ode45 was employed. A relative tolerance (RelTol) of 10~ 4 was used in these 
calculations. 

Numerical derivatives 

Once the solutions to Equations (4.5-4.8) are found, their partial derivatives with respect to 
each parameter are required (as in Equation (4.26)). These derivatives were approximated 
numerically using the Central Difference Method, 

df(x) ^ f(x(l + h 1 ) + h 2 ) - f(x(l - fri) - h 2 ) 
dx 2(xhi + h 2 ) 

for a suitably small hi and h 2 . 

Since numerical approximations to both the ODE solutions and the partial derivatives are 
used, it is important that the ODE solutions are more precise than the numerical derivatives. 
For example, if the ODE solutions are correct to the 4 th decimal place, hi and h 2 must be 
chosen such that the changes in /(■) must be larger than 10~ 4 , otherwise the results may be 
inaccurate. An alternative solution would be to use the 'direct method' (see Atkinson and 
Bogacka (2002) for an example of its use). 

4.4 Results 

4.4.1 Optimal design 

The set of optimal sampling times S* are given in Table 4.2. The times have been rounded 
to the nearest minute. As the sampling times have been chosen from a continuous interval, 
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these schedules are difficult to implement in practice. Indeed, in some cases they are actually 
impossible, with several blood samples being taken concurrently. It is therefore necessary to 
extend the population design to a sequence of intervals, or windows, in which the samples 
may be taken. We chose sampling windows of 3 hours (smaller for the earlier sampling 
times), roughly centered around the optimal times. These windows are also shown in Table 
4.2. 

The efficiency of a population design S in this case is given by 

JM*i.*2,5) 

where S* is the optimal population design. Any deviation from the optimal sampling times 
will obviously lead to a decrease in efficiency. The sampling windows were evaluated by 
taking 1000 designs at random from all the given intervals and calculating their efficiencies. 
The 1000 designs were selected by taking a set of sampling times from uniform distributions 
on all of the sampling windows, i.e. these are joint (not marginal) sampling windows. Figure 
4.3 shows the distribution of these efficiencies. The average efficiency was about 0.95, with 
96.2% of the designs having an efficiency of 0.90 or greater. The 'worst' design found had 
an efficiency of about 0.85. The sampling windows were deemed acceptable in light of these 
results, and did not need to be shortened to give any further increase in efficiency. 

4.4.2 Design evaluation 

Model discrimination 

The optimal design was evaluated in terms of its ability to select whether the 'true' underlying 
model involves linear or nonlinear elimination from the parent compartment. 

To do this, 100 sets of data were simulated under each model (linear elimination and 
nonlinear elimination) using the parameter values given in Table 4.1 and the optimal design 
provided in Table 4.2. The models were fitted to each set of data using NONMEM (version 
5 with the G77 FORTRAN compiler, using F0CE k INTERACTION), and the 'best' model was 
chosen empirically as the one with the smallest objective function (which is proportional 
to minus twice the log-likelihood). Recall that CL 2 4, Q and V3 are considered to be hxed 
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Table 4.2: Optimal population design S* for the population pharmacokinetics study of 
itraconazole, with 10 patients in each of the 3 elementary design groups. 
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under both the linear and nonlinear models, and that K m is considered fixed in the nonlinear 
elimination model. 

After simulating under the linear model, the linear model was chosen correctly 74% of 
the time. After simulating under the nonlinear model, the nonlinear model was chosen 
correctly 100% of the time. For all estimation runs, the NONMEM run was reported to have 
converged successfully. 



These results are considered to be acceptable, especially considering that at low con- 
centrations (in comparison to K m ), the linear and Michaelis-Menten (nonlinear) models are 
virtually indistinguishable, meaning that the nonlinear elimination model will often fit very 
well to data generated by the linear model. 
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FIGURE 4.3: Histogram of efficiencies of fOOO simulations of sampling times taken uniformly 
within all of the sampling windows described in Table 4.2. 



Standard errors 



The results from these simulations were also used to calculate standard errors of the parame- 
ters estimates. Empirical estimates of the standard errors were obtained from the standard 
deviation of the estimated parameter values from the 100 simulated data sets. This was per- 
formed only in those cases where the same model was used for simulation and estimation. 
These are given in Table 4.3 for both the linear and nonlinear models. We can compare these 
standard errors to those predicted by the information matrix used to calculate the optimal 
design. These 'expected' standard errors are given by the square roots of the diagonals of 
the inverse of each Fisher information matrix. These are given in Table 4.3. 

The comparison of expected and simulated standard errors reveal quite satisfactory re- 
sults, considering the difficulty in estimating between subject variability and absorption 
rate constants. The only parameter with a significant difference between the expected and 
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Table 4.3: Standard errors (expressed as coefficients of variation) for the population mean 
and between subject variability for parameters estimated under the linear and nonlinear 
models. The expected standard errors are calculated from the Fisher information matrix, 
the estimated standard errors are the standard deviations of parameter estimates from 100 
NONMEM simulation-estimation runs. The dash indicates that normality was rejected 
by the Kolmogorov-Smirnov test (the data was highly positively skewed), in which case 
calculation of standard errors in meaningless. 
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estimated standard errors is k a . The absorption rate constant is notoriously difficult to es- 
timate in PK models, so it is not unexpected that the estimates from the simulations were 
so variable. 

Parameter estimation 



To assess the benefit of the optimal population design with respect to parameter estimation, 
two additional designs were calculated: one is optimal in terms of the linear model alone, 
and the other optimal in terms of the nonlinear model alone. The D-optimal design for 
model u, denoted by S*, was found by maximising the criterion 



|M M (r,,. 



|i/p« 



u= 1,2 



(4.30) 



using the same methods as described in Section 4.3.5. 
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The efficiency of any population design S with respect to model u is defined (in a similar 
fashion to Equation (4.29)) to be the ratio 

" h ^ u> ~ |M U (* U , s*)! 1 ^ 

where S* is the product optimal design given in Table 4.2. 

For the product optimal design £*, the criteria given by Equation (4.30) were calculated 
under the linear and nonlinear models separately. The values found were 37.591 and 115.1, 
respectively. The criteria for designs S* and S|, which are optimal under the linear and 
nonlinear model, respectively, are 38.993 and 120.01. Hence the efficiencies of S* in terms 
of the linear and nonlinear models are: 

37 591 
Effi(S*,S?) = ^^ = 0.9640 
U ' l ' 38.993 



and 



115 1 
Eff 2 (S*,s:) = — ^^ = 0.9591. 

y ' 2 ' 120.01 



This indicates that the product optimal design is suitably efficient at estimating the para- 
meters of both models. 

4.5 Discussion 

The practical example of the itraconazole study is still underway at the time of writing. The 
authors are currently waiting on results from the actual trial. 

For comparison, the empirical 'rich' sampling design given in Section 4.3.3 was used, and 
the expected standard errors from the design were calculated from the information matrix. 
In this case, again in a study involving thirty patients, all parameters in question would be 
estimable. The relative standard error of all fixed effects would all be less than 20%, in most 
cases less than 10%. 

The construction of the optimal design was subject to several constraints. A timeline 
of approximately two months was given to develop the entire design, from collating prior 
information on itraconazole and researching applicable methods, to computing the hnal 
design (a procedure which took over seven days in itself). Future research in this area includes 
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the consideration of alternative methodology (such as the 'direct method' of Atkinson and 
Bogacka (2002) and more sophisticated incorporation of multiple response types for mixed 
effects models). Under the given time constraints, computation time was a major issue, 
with the hnal design computation taking over a week to run (on a 2.4 GHz Pentium 4 
with 1 GB of RAM). Cost limitations of the study and limitations imposed by the relevant 
ethics committee also limited the possible number of patients and the number of samples per 
patient, which further constrained the design in terms of the number of parameters which 
are estimable. 

Several ideas were used in this design which are novel to the held of experimental design 
for population pharmacokinetic studies. These include the use of the D-optimality product 
criterion for model discrimination and the incorporation of multiple responses into the Fisher 
information matrix. 

The results of the power tests and simulations show that the use of this product cri- 
terion has created a design which is efficient in terms of both parameter estimation and 
model discrimination. Whether the individual D-optimal designs were also adequate for 
model discrimination was not investigated for this design. However, we have seen evidence 
(Waterhouse et al., 2004b) that suggests that this is not true for nonlinear models. 

The standard errors predicted by the inverse of the Fisher information matrix generally 
match those estimated by the simulations. We believe that these results show that the 
approximations made by Retout and Mentre (2003) are mostly inconsequential. Also, the 
assumptions about the covariance structure of the two responses in the model seem to be 
justified by these results. 

The sparse sampling design presented in this paper gives a simple and cost-effective data 
collection regimen which should provide acceptable estimates of the parameters of interest. 
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Part II 



Generalised linear models 
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The ideas formed in Part I of this thesis are now carried over to generalised linear models. 
GLMs present some additional challenges in regard to discrimination of nested models. The 
problem of constructing designs for both effective discrimination between two competing 
nested models and efficient estimates of the parameters of both models is addressed, and 
demonstrated for a number of examples. The optimal design of crossover studies in another 
pharmacological application is also investigated, where the structural model is assumed to 
be known. The sensitivity to parameter misspecification of the constructed designs is then 
examined. Finally, we extend the logistic regression model used in the crossover study to 
include random coefficients and evaluate some ideas for optimal design for these models, 
where design for estimation remains a challenge. 
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Chapter 5 

Designs for discrimination and 
estimation in generalised linear 
models 



5.1 Introduction 

The optimal design of experiments for efficient parameter estimation in generalized linear 
models (GLMs), as for the nonlinear models discussed in Chapter 3, may be addressed 
through the use of well-established criteria such as the D-optimality criterion (see Atkinson 
and Haines (1996) for an overview of D-optimal designs for GLMs), although results in this 
area are generally limited to models with no more than two explanatory variables. Model 
discrimination for GLMs, on the other hand, is a problem in optimal design which has 
received relatively little attention. GLMs present additional challenges to the problem of 
model discrimination, which are addressed in this chapter. 

The T-optimality criterion for optimal design for discrimination between two nonlinear 
models was defined in Atkinson and Fedorov (1975a), and examples of its use have been 
shown in Chapter 3 of this thesis. Ponce de Leon and Atkinson (1992) subsequently extended 
this criterion to apply to generalised linear models of the same subclass, i.e. models where the 
responses share the same distribution class. The criterion involves maximising the deviance 
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(as defined in Section 2.2.3) arising from the fit of one model, where the data are generated 
under another model which is assumed to be true. M tiller and Ponce de Leon (1996) describe 
a sequential approach to the design problem, again using the maximum deviance of the 
'untrue' model to choose each additional design point. 

However, the deviance function may not always be a suitable measure of goodness-of-fit 
for binary data (McCullagh and Nelder, 1989). Under certain conditions, the asymptotic 
Xn-p distribution (where n is the number of support points, and p is the number of parameters 
in the model under consideration) does not hold, and the deviance is not independent of the 
fitted probabilities, in which case a large deviance "cannot necessarily be considered to be 
evidence of a poor fit" (McCullagh and Nelder, 1989, p. 119). McCullagh and Nelder point 
out, though, that the deviance function can be useful for comparing two nested models. The 
X 2 distribution is "usually quite accurate" for differences in deviances of two nested models 
(McCullagh and Nelder, 1989, p. 119), and we use this as a basis for criteria for designs to 
discriminate between such models. 

This chapter will assess the ability of T-optimal designs to discriminate between two 
nested logistic regression models, augment these designs with product optimal designs sup- 
port points to improve parameter estimation (as described in Chapter 3), and suggest an 
alternative optimality criterion based on the expected difference in deviance of the models. 

5.2 Methods 

5.2.1 Criteria 

T-optimality 

The T-optimality criterion is now restated for GLMs with binary responses, as given in 
Section 2.3.3, in slightly more detail. Suppose that we have an n-point approximate design 

f*i £2 ••• L 

\wi Wi ... w r 

and that at each support point of the design, we observe the number of successes Si, which 
has a binomial distribution Bin(mj, 7Tj). The proportion of successes Y t = Si/rrii may be 
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modelled by a GLM. Recall from Section 2.2 that a GLM for binomial data is given by 

g(ir i ) = r] i , i = l,...,n, 

and 

rj = (771, ...,tj„)' = Xf3, 

where each row of the n x p matrix X is a p-vector of known functions of £^; (3 is the 
p x 1 vector of model parameters; and the link function g is usually either the logistic link 
function, g^i) = log{7Ti/(l — 7T;)}; the probit link function, g(iTi) = $ _1 (7T;), where $ is the 
cumulative distribution function of the standard normal distribution; or the complementary 
log-log link function, g(iTi) = log{— log(l — 71^)}. 
The log-likelihood function is 

n 

£(tt; Y) = J2 w i % lo §^ + (1 " Y i) M 1 " Ti)] , 

i=l 

where Y = (Y u . . . , Y n )' and 7r = (7Ti, . . . , ir n )'. 

Suppose that we are interested in choosing between two candidate GLMs, Mi and M 2 
with the same link function but different linear predictors, rj 1 and r} 2 respectively, where 
77 x is 'nested' in rj 2 , i.e. r) 2 contains all of the terms from r) 1 as well as one or more extra 
covariates. Let the parameter vectors for models Mi and M 2 be the px-vector (3 l and the 
p 2 -vector (3 2 , respectively. Denote the maximum likelihood estimates of the probabilities 
from each model as tz\ = (ttu, . . . , 7rin)' and ir 2 = (tt 2 i, . . . , TT 2n )' . The deviance from the fit 
of model u (with u = 1 or 2) is defined as 



D(Y;*„) = 2t(Y;Y)-2l(*„;Y) 



^E 



Wi 



yaog(y l /7r M8 ) + (i-y l )^ 



l-Yj 

1 — 7T„, 



For a T-optimal design, the assumption is made that the larger model M 2 is the 'correct' 
model with ir 2 known. The T-optimality criterion is the deviance arising from the fit of M\ 
to the expected value of Y under M 2 , ir 2 . That is, the criterion is 

D(tt 2 ;tti). (5.1) 
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Define the T-efficiency of a design as the ratio of its value of T-optimality criterion, as given 
above in Equation (5.1), to the T-optimality criterion value for the T-optimal design. This 
is analogous to the T e 6 ff efficiency from Equation (3.3). 

Tj;-optimality 

As mentioned previously, the random variable D( Y; iz u ) may not always follow the asymp- 
totic Xn- Pu distribution, and so may be unsuitable as a measure of goodness of fit, and hence 
unsuitable as a design criterion. The reduction in deviance may be considered instead: 

R(Y) = L>(F;7r 1 )- J D(F;7r 2 ) (5.2) 

= 2£(* 2 ;Y)-2£(* 1 ;Y). 

Under M\, R(Y) is distributed approximately as Xp 2 - Pl under the assumptions of no over- 
dispersion and fixed n. 

As an alternative to T-optimal designs for GLMs, consider the average reduction in 
deviance given that the second model is true, i.e. an alternative criterion is Fi(R(Y)\M 2 ). 
As this expectation may not be evaluated analytically, Monte Carlo simulation is used to 
produce an approximation to the criterion: 

E(R(Y)\M 2 )*J2 R ( Y *y N <» ( 5 - 3 ) 

»=i 

where N s is the number of simulations (typically large, at least 10 3 ) and Y* = (Y^*, . . . , Y*^) T 
are the simulated data. The simulated data in this case are the sample proportions, taken 
from a truncated normal distribution with mean 7r 2 and a variance based on the structure 
of the binomial variance, 

Vzr(Y i *) = n 2j (l-n 2j )K, 

where K is chosen to quantify the experimenter's confidence in M 2 (a larger K would result in 
increased variation from M 2 , so would correspond to less confidence in the larger model). We 
refer to a design which maximises this optimality criterion in Equation (5.3) as Tg-optimal. 
The T^-optimality criterion has desirable theoretical qualities in terms of constructing 
designs for model discrimination, as the yj distribution of the reduction in deviance is 
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expected to be more accurate than that of the single deviance used in T-optimality. However, 
due to the large number of simulations required to produce the criterion value for a single 
design, it is not feasible to be used to construct an optimal design, especially when using a 
search algorithm like simulated annealing, which typically requires hundreds of thousands of 
calculations of the criterion. Instead focus is placed on evaluation of the T-optimal designs 
described above and the construction of hybrid designs described below. 

Hybrid designs 

A compromise between model discrimination and parameter estimation is sought, as in 
Chapter 3. To achieve this objective, the hybrid designs described in that chapter may be 
used, as they produce similar results to the conditional designs with much less computational 
effort. This involves finding the T-optimal design, as well as the corresponding product 
optimal design, and combining the design points of the two designs, with weights reflecting 
the importance placed on each objective. 

5.2.2 Power tests 

The ability of an experimental design to discriminate between two competing models may 
be assessed by power tests. In these power tests, a large number (iV sim ) of sets of simulated 
random binomial data s k (a vector of size n for an n-point design, for k = 1, . . . , iV sim ) is 
generated using the true model M^- For a given design £, each simulated vector s£(£) = 
(s£i(£)> . . . , s* kn (^))' is a realisation of S*(£) = (£*(£)> . . . , S^(())', whose elements have the 
following distribution: 

S*(0 ~ Bin([w;iV sam pi e ],7r2i), 

where [x] is the integer nearest to x. iV samp i e is the total size of the sample from which the 
binomial counts are taken in the simulations, chosen in an ad hoc manner to be large enough 
to be able to reasonably fit the models to the data, but not so large that discriminating 
between the two models becomes a trivial exercise. The simulated proportions of success 
Vk(0 = (s/fci(0. • • • >3/fcn(0)' are then calculated by y* ki {£) = s* ki {0/[wiN saiaple }. 

The power of the design is then defined to be the proportion of times that model M 2 is 
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correctly chosen as a better fit to the data than model M\. The choice of the 'best' fit is 
determined by carrying out the likelihood ratio test for testing H : Mi against Ha '■ M 2 . 
The test statistic is the reduction in deviance R(y^(^)) as defined in Equation (5.2), which 
is distributed approximately like Xp 2 - Pl - We select Mi as the more appropriate model on 
the fcth simulation if R(y£(0) > Xp 2 - Pi; o.95- 

5.3 Nested logistic regression models with two factors 

A likely scenario for model discrimination with GLMs is the choice between logistic regression 
models with two factors, where interaction between the two factors is included in one model 
and omitted in the other (so Pi = 3 and p 2 = 4). 

Mi : logit(TTi) = P w + p n x 1 + P i2 x 2 
M 2 : logit(7r 2 ) = P 20 + #212:1 + P 22 X 2 + /fe^il^ 

— 1 < Xi, x 2 < 1 

The choice of bounds for x\ and x 2 was made to 'standardise' the design region. Any bounded 
explanatory variable may be scaled to the same region, as long as the model coefficients are 
altered accordingly. 

5.3.1 Parameter values 

As all of the design criteria presented in this chapter rely on the specification of the para- 
meters for the second model, a value must be set for (3 2 = (P20, P21, P22, P2z)' ■ P20, P21 arid 
P 22 are all arbitrarily set to 1. (The next section considers the case where P 22 , the coefficient 
of x 2 , is twice that of the coefficient of xi, i.e. P 22 = 2. The first set of results will hence 
be labelled as the '#22 = 1 case'.) A range of values for p 2 s (the interaction coefficient) is 
chosen so that the interaction creates a big enough difference between the two models to 
make it worthwhile designing an experiment to discriminate between them, but not so large 
as to make the discrimination problem trivial. 

For a range of values of the interaction coefficient (from —4 to 4), the expected responses 
under model M 2 (7r 2 ) was calculated on a 20 x 20 grid of values of X\ and x 2 , each over 
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the range [— 1, 1]. The smaller model M\ was then fitted to these expected responses using 
iteratively reweighted least squares. Denote the fitted probabilities by ts\ and the fitted 
parameter vectors by /3 1 . The differences 7r 2 — ~k\ are shown in Figure 5.1, along with the 
values of (3 2 and {3 1 in each case. After examining these plots, it was decided that the most 
interesting range of values of the interaction coefficient was from —1 to 1. Outside this 
range, the difference between the models created by the large interaction would make model 
discrimination trivial. 

5.3.2 Designs for discrimination 

The T-optimal designs for the various interaction parameter values are given in Table 5.1 and 
Figure 5.2. Each support point in the figure is represented by its position in the xy-plane, 
and its corresponding weight is proportional to the size (area) of the plotted point. 

The actual support points for the T-optimal designs do not change for different values 
of the interaction coefficient, they are always at the corners of the design region, i.e. the 
support points of the 2 2 factorial design. Only the weights are affected by the value of this 
parameter, and these changes are only minimal. The differences in weights between these 
designs and the factorial design (with equal weights of 0.25), however, is important. The 
2 2 factorial design has reasonably lower T-efficiencies than these T-optimal designs, ranging 
from 0.88 to 0.93. 

The fitted parameter values for model Mi under the T-optimal designs are given in Table 
5.2. 

5.3.3 Designs for parameter estimation 

Assessment of the T-optimal designs above in terms of their ability to produce good para- 
meter estimates was also considered. To do this, locally D-optimal designs were generated 
to compare the T-optimal designs against, in which case values of (3 1 need to be specified 
(/3 2 is already known from the construction of the T-optimal designs). The MLE estimates 
of fii from the T-optimal designs found previously, given in Table 5.2, were used here. 

D-optimal designs for models M x and M 2 are given in Tables 5.3 and 5.4, as well as 
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FIGURE 5.1: The difference between models M\ and M2 over various values of the interaction 
coefficient in M2 when /?22 = 1- For a given /3 2 , fix is the best fit of Mi to the expected responses 
under M2- The colour of the surface is indicative of the difference between the expected response 
of the models: red signifies that -k<i is greater than n±, blue signifies that 7T2 is less than rri- 
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Table 5.1: T-optimal designs for nested logistic models with two variables when p 22 = 1- 
/3' 2 T-optimal design 

(1,1,1,1) 



(1,1,1,0.5) 



(1,1,1,0.3) 



(1,1,1,-0.3) 



(1,1,1,-0.5) 



(1,1,1,-1) 



Xi 


-1.0000 


-1.0000 


1.0000 


1.0000 


x 2 


-1.0000 


1.0000 


-1.0000 


1.0000 


w 


0.2038 


0.2038 


0.2038 


0.3886 


x l 


-1.0000 


-1.0000 


1.0000 


1.0000 


x 2 


-1.0000 


1.0000 


-1.0000 


1.0000 


w 


0.1973 


0.1973 


0.1973 


0.4081 


Xi 


-1.0000 


-1.0000 


1.0000 


1.0000 


x 2 


-1.0000 


1.0000 


-1.0000 


1.0000 


w 


0.1962 


0.1962 


0.1962 


0.4113 


Xi 


-1.0000 


-1.0000 


1.0000 


1.0000 


x 2 


-1.0000 


1.0000 


-1.0000 


1.0000 


w 


0.1994 


0.1994 


0.1994 


0.4017 


x l 


-1.0000 


-1.0000 


1.0000 


1.0000 


x 2 


-1.0000 


1.0000 


-1.0000 


1.0000 


w 


0.2025 


0.2025 


0.2025 


0.3924 


Xi 


-1.0000 


-1.0000 


1.0000 


1.0000 


x 2 


-1.0000 


1.0000 


-1.0000 


1.0000 


w 


0.2132 


0.2132 


0.2132 


0.3604 



Table 5.2: Fitted values of the parameters of the smaller model, given by the T-optimal 
designs for the various values of (3 2 , when f3 22 = 1. 

1*2 K 



(1,1,1,1) (0.7072,0.7072,0.7072) 

(1, 1, 1, 0.5) (0.8679, 0.8679, 0.8679) 

(1, 1, 1, 0.3) (0.9261, 0.9261, 0.9261) 

(1, 1, 1, -0.3) (1.0523, 1.0523, 1.0523) 

(1, 1, 1, -0.5) (1.0728, 1.0728, 1.0728) 

(1, 1, 1, -1) (1.0707, 1.0706, 1.0706) 
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Figure 5.2: T-optimal designs for (a) /3 2 = (1,1,1,1)', (b) /3 2 = (1,1,1,0.5)', (c) /3 2 
(1, 1, 1, 0.3)', (d) (3 2 = (1, 1, 1, -0.3)', (e) p 2 = (1, 1, 1, -0.5)', and (f) /3 2 = (1, 1, 1, -1)'. 
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in Figures 5.3 and 5.4. The product of the T>-optimality criteria was also used to hnd the 
'product optimal' designs (as described in Section 3.1.2 for nonlinear models), which are 
given in Table 5.5 and Figure 5.5. 

The hrst D-optimal design for model Mi presented here (when /3 23 = 1) shares the 
same support points as the T-optimal design, with different weights. For lower values of this 
parameter, however, the point (1,1) disappears to leave a three-point design at the remaining 
corners of the design space. 

More interesting results are seen for the D-optimal designs for model M 2 , where for /3 2 3 
equal to —0.3 or —0.5 the support points are again in the corners of the design region, but for 
different values of /3 2 3 there is either a further deviation from the point (1,1), or an increase 
in the number of support points to 5. 

The product optimal designs show a compromise between the T>-optimal designs for 
models M\ and M 2 . The three corners (—1,-1), (—1,1) and (1,-1) (or points very nearby) 
are again common characteristics of the design, with the position of the remaining point (s) 
seemingly dictated by the corresponding D-optimal designs for model M 2 . Comparing these 
to the T-optimal designs seems to suggest that for greater positive values of the interaction 
coefficient, moving design points away from (1,1) seems to produce designs more efficient in 
regard to parameter estimation. 

5.3.4 Hybrid designs and power tests 

Given the T-optimal designs in Section 5.3.2 and the product optimal designs in Section 

5.3.3, these are combined to form hybrid designs, using the procedure outlined in Section 

3.1.4. Hybrid designs are calculated for many levels of a, from a = (T-optimal design) 
up to a = 1 (product optimal design), in steps of 0.05. For each design, its T-efficiency, its 
D-efficiencies under both models, and its power to discriminate between the two models are 
calculated, using the method described in Section 5.2.2. 

For the power tests, the value of A^ samp i e was chosen by some preliminary investigation, 
where power tests were performed on a number of hybrid designs for all parameter values. 
Sample = 200 seemed to be large enough that both models would fit the simulated data at 
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Figure 5.3: D-optimal designs for model Mi, for (a) f3 l = (0.7072,0.7072,0.7072)', (b) (3 l = 
(0.8679,0.8679,0.8679)', (c) (3 l = (0.9261,0.9261,0.9261)', (d) f3 1 = (1.0523,1.0523,1.0523)', (e) 
(3 X = (1.0728,1.0728,1.0728)', and (f) f3 1 = (1.0707,1.0706,1.0706)'. The values of fr are taken 
from the T-optimal designs for the values of /3 2 given in Table 5.2. 
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Figure 5.4: D-optimal designs for model M 2 , for (a) (3 2 = (1,1,1,1)', (b) j3 2 = (1,1,1,0.5)', 
(c) (3 2 = (1, 1, 1, 0.3)', (d) f3 2 = (1, 1, 1, -0.3)', (e) (3 2 = (1, 1, 1, -0.5)', and (f) (3 2 = (1, 1, 1, -1)'. 
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Figure 5.5: Product optimal designs for models Mi and M 2 , for (a) f3 2 = (1,1,1,1)', (b) 
(3 2 = (1, 1, 1, 0.5)', (c) (3 2 = (1, 1, 1, 0.3)', (d) 2 = (1, 1, 1, -0.3)', (e) (3 2 = (1, 1, 1, -0.5)', and (f) 
/3 2 = (1, 1, 1, —1)'. Corresponding values of f3 1 for these values of /3 2 are given in Table 5.2. 
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Table 5.3: D-optimal designs for model Mi when P 2 2 = 1- The values of /3 1 are taken from 
the T-optimal designs for the values of /3 2 given in Table 5.2. 

— ^7 

f3 x D-optimal design 



(0.7072, 


0.7072, 


0.7072) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 








*2 


-1.0000 


1.0000 


-1.0000 


1.0000 








W 


0.2991 


0.2991 


0.2991 


0.1026 


(0.8679, 


0.8679, 


0.8679) 


X\ 


-1.0000 


-1.0000 


1.0000 










X 2 


-1.0000 


1.0000 


-1.0000 










W 


0.3333 


0.3333 


0.3333 




(0.9261, 


0.9261, 


0.9261) 


x l 


-1.0000 


-1.0000 


1.0000 










X2 


-1.0000 


1.0000 


-1.0000 










W 


0.3333 


0.3333 


0.3333 




(1.0523, 


1.0523, 


1.0523) 


Xi 


-1.0000 


-1.0000 


1.0000 










%2 


-1.0000 


1.0000 


-1.0000 










W 


0.3333 


0.3333 


0.3333 




(1.0728, 


1.0728, 


1.0728) 


Xi 


-1.0000 


-1.0000 


1.0000 










%2 


-1.0000 


1.0000 


-1.0000 










w 


0.3333 


0.3333 


0.3333 




(1.0707, 


1.0706, 


1.0706) 


Xi 


-1.0000 


-1.0000 


1.0000 










X2 


-1.0000 


1.0000 


-1.0000 










w 


0.3333 


0.3333 


0.3333 





least about 97% of the time (data sets where both models did not fit were not included in 
the N s [ m data sets used to calculate the power), but still not so large that the power was 
always close to 1, regardless of the design. iV sim = 5000 was chosen to give reasonably stable 
results. The results for the various parameter values are shown in Figure 5.6. 

As seen in the examples in Chapter 3, it can be seen that as a increases (reflecting an 
increasing importance placed on parameter estimation), the D-efficiencies increase, and the 
T-efficiency decreases. 

The results of the power tests are of more interest here. It is important to note that the 
size of the power is not of great concern here, as the power of a design is highly dependent 
on the size of the samples used in the simulations. Rather it is the trend in power as a 
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Table 5.4: D-optimal designs for model M 2 when /? 2 2 = 1- 



02 






D-optimal design 








(1,1, 


1,1) 


X\ 


-1.0000 


-1.0000 


0.1404 


1.0000 


1.0000 






x 2 


-1.0000 


1.0000 


1.0000 


-1.0000 


0.1403 






w 


0.2500 


0.2461 


0.1289 


0.2461 


0.1290 


(1,1, 


1,0.5) 


x i 


-1.0000 


-1.0000 


0.7036 


1.0000 








x 2 


-1.0000 


1.0000 


0.7034 


-1.0000 








w 


0.2500 


0.2500 


0.2500 


0.2500 




(1,1, 


1,0.3) 


X-[ 


-1.0000 


-1.0000 


0.8119 


1.0000 








x 2 


-1.0000 


1.0000 


0.8118 


-1.0000 








w 


0.2500 


0.2500 


0.2500 


0.2500 




(1,1, 


1,-0.3) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 








%2 


-1.0000 


1.0000 


-1.0000 


1.0000 








w 


0.2500 


0.2500 


0.2500 


0.2500 




(1,1, 


1,-0.5) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 








x 2 


-1.0000 


1.0000 


-1.0000 


1.0000 








w 


0.2500 


0.2500 


0.2500 


0.2500 




(1,1, 


1,-1) 


Xi 


-1.0000 


-1.0000 


-0.7370 


0.7370 


1.0000 






x 2 


-0.7370 


0.7370 


-1.0000 


-1.0000 


1.0000 






w 


0.1264 


0.2486 


0.1264 


0.2486 


0.2500 



changes that is of interest. The slightly irregular shape of the power plots is due to the 
random nature of the power tests. Increasing iV sim would smooth these curves somewhat. 

When the magnitude of the interaction coefficient P 23 is relatively large (1 or —1), the 
two models are very easy to distinguish between for these sample sizes, so the powers are 
nearly always very close to 1. The trends of the powers are easier to interpret for | /?23 1 < 1- 
It seems that an increase in a results in a general decrease in power. This shows that 
putting more weights on the T-optimal design points results in a design with greater power 
to discriminate between the models. This increase in power, however, is almost negligible. 
For these parameter values, the product designs show a power to discriminate between the 
models almost equivalent to that of the T-optimal designs. 
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Figure 5.6: Efficiencies and power of hybrid designs for models Mi and M2, for (a) 
f3 2 = (1,1,1,1)', (b) p 2 = (1,1,1,0.5)', (c) (3 2 = (1,1,1,0.3)', (d) /3 2 = (1,1,1,-0.3)', (e) 
/3 2 = (1,1,1,-0.5)', and (f) f3 2 = (1,1,1,-1)'. Corresponding values of (3 l for these values of 
/3 2 are given in Table 5.2. 
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Table 5.5: Product optimal designs for models Mi and M 2 when /3 22 = 1. 



01,02 



Product optimal design 



01 = 


(0.7072,0.7072, 


0.7072)' 


X\ 


&2 = 


(1,1,1,1)' 




X-2 
W 


fl,= 


(0.8679,0.8679, 


0.8679)' 


X\ 


^2 = 


(1,1,1,0.5)' 




X-2 
W 


Al = 


(0.9261,0.9261, 


0.9261)' 


X\ 


fla = 


(1,1,1,0.3)' 




X'2 
W 


fli = 


(1.0523,1.0523, 


1.0523)' 


X\ 


02 = 


(1,1,1,-0.3)' 




%2 

W 


fl,= 


(1.0728,1.0728, 


1.0728)' 


x l 


^2 = 


(1,1,1,-0.5)' 




*2 
W 


Al = 


(1.0707,1.0706, 


1.0706)' 


X\ 


02 = 


(1,1,1,-1)' 




X2 



IV 



-1.0000 -1.0000 0.1937 1.0000 1.0000 

-1.0000 1.0000 1.0000 -1.0000 0.1938 

0.2757 0.2562 0.1060 0.2562 0.1060 

-1.0000 -1.0000 0.4676 1.0000 1.0000 

-1.0000 1.0000 1.0000 -1.0000 0.4676 

0.2745 0.2655 0.0973 0.2655 0.0973 

-1.0000 -1.0000 0.8018 1.0000 

-1.0000 1.0000 0.8019 -1.0000 

0.2746 0.2697 0.1861 0.2697 

-1.0000 -1.0000 1.0000 1.0000 

-1.0000 1.0000 -1.0000 1.0000 

0.2764 0.2764 0.2764 0.1707 

-1.0000 -1.0000 1.0000 1.0000 

-1.0000 1.0000 -1.0000 1.0000 

0.2771 0.2771 0.2771 0.1687 

-1.0000 -1.0000 -0.9766 0.9777 1.0000 

-0.9745 0.9765 -1.0000 -1.0000 1.0000 

0.1169 0.2767 0.1592 0.2767 0.1705 



5.3.5 Alternative parameter values 

In the previous section, all designs were found assuming that the intercept and both main 
effect coefficients for the larger model were all 1 (i.e. that (i 2 o = /?2i = /?22 = !)• I n this 
section, all of the design optimisations are repeated with f3 22 = 2, that is it is supposed that 
the effect of the second factor x 2 is twice as large as the effect of the first factor X\. 

It can be seen from Figure 5.7 that again the region of interest for the interaction term 
is between —1 and 1. Hence, the design process is repeated for the same values of P 2 3 as 
in the previous section. While the three corner points (—1,-1), (—1,1) and (1,-1) remain 
more or less fixed, the fourth support point is placed further from the corner as the value of 
the interaction coefficient increases. 
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beta2 = 112 4 beta2 = 112 2 beta2 = 112 1 

betal = 0.62686 0.32904 1.1046 betal = 0.78921 0.57434 1.595 betal = 0.89512 0.77455 1.8374 




1 -1 Xo 



1 -1 x. 



1 -1 x 



beta2 = 1 1 2 0.5 beta2 = 1 1 2 -0.5 beta2 = 1 1 2-1 

betal =0.94912 0.88717 1.9312 betal =1.0444 1.1052 2.0422 betal =1.079 1.1947 2.0577 



i 






beta2 = 1 1 2-2 beta2 = 1 1 2-4 

betal =1.1075 1.3005 2.0089 betal = 0.98534 1.1718 1.6299 



i 





FIGURE 5.7: The difference between models M\ and M2 over various values of the interaction 
coefficient in M2, when fill = 2. For a given j3 2 , j3\ is the best fit of Mi to the expected responses 
under Mi- The colour of the surface is indicative of the difference between the expected response 
of the models: red signifies that 7T2 is greater than 7i"i, blue signifies that 7T2 is less than t\\. 
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Table 5.6: T-optimal designs for nested logistic models with two variables when P22 = 2. 



02 






T-optimal 


design 






(1,1,2, 


1) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 






x 2 


-1.0000 


1.0000 


-1.0000 


0.3947 






w 


0.1944 


0.1944 


0.2285 


0.3827 


(1,1,2, 


0.5) 


X-[ 


-1.0000 


-1.0000 


1.0000 


1.0000 






x 2 


-1.0000 


1.0000 


-1.0000 


0.4381 






w 


0.1996 


0.1996 


0.2000 


0.4007 


(1,1,2, 


0.3) 


X-[ 


-1.0000 


-1.0000 


1.0000 


1.0000 






x 2 


-1.0000 


1.0000 


-1.0000 


0.4708 






w 


0.2016 


0.2016 


0.1885 


0.4082 


(1,1,2, 


-0.3) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 






%2 


-1.0000 


1.0000 


-1.0000 


0.6554 






w 


0.2067 


0.2067 


0.1550 


0.4316 


(1,1,2, 


-0.5) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 






x 2 


-1.0000 


1.0000 


-1.0000 


0.7620 






w 


0.2082 


0.2082 


0.1441 


0.4395 


(1,1,2, 


-1) 


X-[ 


-1.0000 


-1.0000 


1.0000 


1.0000 






X2 


-0.9155 


0.9155 


-1.0000 


1.0000 






W 


0.2173 


0.2173 


0.1304 


0.4350 



The T-optimal designs for the various interaction parameter values are given in Table 
5.6 and Figure 5.8. The htted parameter values for model M\ under the T-optimal designs 
are given in Table 5.7. 

In contrast to the T-optimal designs for the original set of parameter values, the support 
points of these designs include some deviation from the four corners of the design region. 
Three corner points remain relatively hxed for all values of /?23, while the x 2 value of the 
'top right' point (the point at (1, 1) for /?23 = — 1) decreases as /?23 increases. 

D-optimal designs for models M\ and M 2 are given in Tables 5.8 and 5.9, as well as in 
Figures 5.9 and 5.10. The product optimal designs are given in Table 5.10 and Figure 5.11. 

The support points of the D-optimal designs for both models, as well as the product 
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Figure 5.8: T-optimal designs for (a) f3 2 = (1,1,2,1)', (b) f3 2 = (1,1,2,0.5)', (c) (3 2 
(1, 1, 2, 0.3)', (d) (3 2 = (1, 1, 2, -0.3)', (e) f3 2 = (1, 1, 2, -0.5)', and (f) f3 2 = (1, 1, 2, -1)'. 
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Table 5.7: Fitted values of the parameters of the smaller model, given by the T-optimal 
designs for the various values of (3 2 , when /3 2 2 = 2. 

"02 ¥l 



(1,1,2,1) (0.6543,0.6544,1.6871) 

(1, 1, 2, 0.5) (0.8283, 0.8283, 1.8736) 

(1, 1, 2, 0.3) (0.8984, 0.8984, 1.9322) 

(1, 1, 2, -0.3) (1.0922, 1.0922, 2.0404) 

(1, 1, 2, -0.5) (1.1448, 1.1448, 2.0496) 

(1, 1, 2, -1) (1.2247, 1.2247, 1.9954) 



optimal designs, show a much greater deviation from the corners of the design region. This 
may be due to the increased nonlinearity of the response surface due to the increase of the 
coefficient of x 2 . The difference between the D-optimal designs for the two models is less 
pronounced than for the original set of parameter values, although the designs for Mi where 
/?23 < seem to require only three support points, whereas the corresponding designs for 
M.2 still include at least four points. These differences are reflected in the product optimal 
designs. 

There are greater differences between the product optimal and T-optimal designs than 
was seen for the original set of parameter values. Given these differences, it may be reasonable 
to expect to see a greater difference in D- and T-efficiencies for the hybrid designs as a 
changes. 

Hybrid designs are again calculated for a range of a, and their T-efficiencies, D-efficiencies 
and powers are calculated. The results for the various parameter values are shown in Figure 
5.12. 

It can again be seen that as alpha increases (reflecting an increasing importance placed 
on parameter estimation), the D-efficiencies increase, and the T-efficiency decreases. 

While for the previous set of parameter values the cases /?23 = — 1 and P23 = 1 were not 
informative in terms of the power calculations due to the large difference between the models 
with and without the interaction term, the results for P23 = — 1 are easier to interpret in 
this case. The powers of the designs calculated with P23 = 1 are still too high to be of any 
practical use in this investigation. A general (often very gentle) decrease in power can again 
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Figure 5.9: D-optimal designs for model M u for (a) /3 2 = (0.6543,0.6544,1.6871)', (b) (3 2 = 
(0.8283,0.8283,1.8736)', (c) (3 2 = (0.8984,0.8984,1.9322)', (d) (3 2 = (1.0922,1.0922,2.0404)', (e) 
(3 2 = (1.1448,1.1448,2.0496)', and (f) f3 2 = (1.2247,1.2247,1.9954)'. The values of fr are taken 
from the T-optimal designs for the values of f3 2 given in Table 5.7. 
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Figure 5.10: D-optimal designs for model M 2 , for (a) (3 2 = (1,1,2,1)', (b) /3 2 = (1,1,2,0.5)', 
(c) (3 2 = (1, 1, 2, 0.3)', (d) (3 2 = (1, 1, 2, -0.3)', (e) f3 2 = (1, 1, 2, -0.5)', and (f) (3 2 = (1, 1, 2, -1)'. 
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Figure 5.11: Product optimal designs for models Mi and M 2 , for (a) (3 2 = (1,1,2,1)', (b) 
(3 2 = (1, 1, 2, 0.5)', (c) f3 2 = (1, 1, 2, 0.3)', (d) (3 2 = (1, 1, 2, -0.3)', (e) (3 2 = (1, 1, 2, -0.5)', and (f) 
f3 2 = (1, 1, 2, —1)'. Corresponding values of 0i for these values of (3 2 are given in Table 5.7. 
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FIGURE 5.12: Efficiencies and power of hybrid designs for models M\ and Af 2 , for (a) 
/3 2 = (1,1,2,1)', (b) /3 2 = (1,1,2,0.5)', (c) p 2 = (1,1,2,0.3)', (d) (3 2 = (1,1,2,-0.3)', (e) 
/3 2 = (1,1,2,-0.5)', and (f) /3 2 = (1,1,2,-1)'. Corresponding values of (3 l for these values of 
/3 2 are given in Table 5.7. 
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Table 5.8: D-optimal designs for model Mi when /3 2 2 = 2. The values of /3 1 are taken from 
the T-optimal designs for the values of /3 2 given in Table 5.7. 



0i 








D-optimal design 






(0.6543, 


0.6544, 


1.6871) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 








X-2 


-0.8544 


0.8544 


-1.0000 


-0.0067 








W 


0.3003 


0.3003 


0.2612 


0.1383 


(0.8283, 


0.8283, 


1.8736) 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 








X-2 


-0.7884 


0.7884 


-1.0000 


-0.1980 








W 


0.3110 


0.3110 


0.2730 


0.1051 


(0.8984, 


0.8984, 


1.9322) 


x l 


-1.0000 


-1.0000 


1.0000 


1.0000 








X-2 


-0.7731 


0.7730 


-1.0000 


-0.2691 








W 


0.3162 


0.3162 


0.2816 


0.0860 


(1.0922, 


1.0922, 


2.0404) 


Xi 


-1.0000 


-1.0000 


1.0000 










X-2 


-0.7564 


0.7564 


-1.0000 










W 


0.3333 


0.3333 


0.3333 




(1.1448, 


1.1448, 


2.0496) 


X\ 


-1.0000 


-1.0000 


1.0000 










%2 


-0.7530 


0.7530 


-1.0000 










w 


0.3333 


0.3333 


0.3333 




(1.2247, 


1.2247, 


1.9954) 


Xi 


-1.0000 


-1.0000 


1.0000 










X2 


-0.7735 


0.7735 


-1.0000 










w 


0.3333 


0.3333 


0.3333 





be seen as a decreases, with the product design again showing only a marginally lower power 
than the T-optimal design in each case. 

5.4 Discussion 

The T-optimal and product optimal designs can easily (without further optimisation) be 
combined into a hybrid design which can address, to some extent, the separate objectives of 
model discrimination and parameter estimation, as seen in Chapter 3 for nonlinear models. 
A hybrid design with more emphasis placed on parameter estimation (and hence more 
weight placed on the support points of the product optimal design) has greater D-efhciencies 



94 Designs for discrimination and estimation in generalised linear models 



Table 5.9: D-optimal designs for model M 2 when /3 2 2 = 2. 
02 T>-optimal design 

(1,1,2,1) 



(1,1,2,0.5) 



(1,1,2,0.3) 



(1,1,2,-0.3) 



(1,1,2,-0.5) 



(1,1,2,-1) 



X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 




a: 2 


1.0000 


-1.0000 


-0.0680 


-1.0000 




w 


0.2500 


0.2500 


0.2500 


0.2500 




x i 


-1.0000 


-1.0000 


1.0000 


1.0000 




x 2 


1.0000 


-1.0000 


0.0298 


-1.0000 




w 


0.2500 


0.2500 


0.2500 


0.2500 




X-[ 


-1.0000 


-1.0000 


1.0000 


1.0000 




x 2 


0.9079 


-0.9079 


0.0865 


-1.0000 




w 


0.2500 


0.2500 


0.2500 


0.2500 




X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 




%2 


0.6711 


-0.6710 


0.3621 


-1.0000 




w 


0.2500 


0.2500 


0.2500 


0.2500 




X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 




x 2 


-0.6173 


0.6174 


0.5120 


-1.0000 




w 


0.2500 


0.2500 


0.2500 


0.2500 




x l 


-1.0000 


-1.0000 


-0.2330 


1.0000 


1.0000 


X 2 


-0.4928 


0.4928 


-1.0000 


-1.0000 


1.0000 


w 


0.1990 


0.2452 


0.0611 


0.2446 


0.2500 



but a lower T-efficiency (as defined here), and to some extent a lower power to discriminate 
between models. This difference in power, however, is generally minimal. The product 
optimal design may hence be an acceptable (and easy to calculate) design to address both 
discrimination and estimation objectives. 

It appears that the T-emciency of a design is a good indication as to whether the design 
will result in a high power to discriminate between the two models. The actual relationship 
between T-emciency and power, however, is yet to be seen, and remains the topic of further 
research. 

If the computational problems associated with the TE-optimality criterion are overcome, 
it will be interesting to see whether designs constructed using this method have the same 
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Table 5.10: Product optimal designs for models Mi and M 2 when (3 2 2 = 2. 



01,0; 








Product < 


optimal design 




01 = 


U6543, 0.6544, 


1.6871)' 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 


02 = 


[1, 1,2,1)' 




X2 


-1.0000 


1.0000 


-1.0000 


-0.0709 








W 


0.2737 


0.2737 


0.2398 


0.2128 


01 = 


;0.8283, 0.8283, 


1.8736)' 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 


02 = 


1,1,2,0.5)' 




X-2 


-0.8626 


0.8626 


-1.0000 


-0.0108 








W 


0.2705 


0.2705 


0.2531 


0.2059 


01 = 


U8984, 0.8984, 


1.9322)' 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 


02 = 


1,1,2,0.3)' 




X2 


-0.8035 


0.8034 


-1.0000 


0.0238 








w 


0.2694 


0.2694 


0.2586 


0.2026 


01 = 


1.0922,1.0922, 


2.0404)' 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 


02 = 


1,1,2,-0.3)' 




x 2 


-0.6773 


0.6773 


-1.0000 


0.1878 








w 


0.2688 


0.2688 


0.2728 


0.1896 


01 = 


1.1448,11448, 


2.0496)' 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 


02 = 


1,1,2,-0.5)' 




%2 


-0.6485 


0.6485 


-1.0000 


0.2782 








w 


0.2696 


0.2696 


0.2768 


0.1841 


01 = 


1.2247, 1.2247, 


1.9954)' 


X\ 


-1.0000 


-1.0000 


1.0000 


1.0000 


02 = 


!l, 1,2,-1)' 




X2 


-0.5996 


0.5995 


-1.0000 


0.6993 








w 


0.2741 


0.2741 


0.2854 


0.1664 



issues as those described in this chapter for T-optimal designs. 
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Chapter 6 



Optimal crossover designs for 
generalised linear models 



To complement the pharmacological application of optimal design for nonlinear models in 
Chapter 4, this chapter is motivated by an application of optimal design for generalised 
linear models to a hypothetical pharmacodynamic experiment, where crossover designs are 
frequently used. 

Pharmacodynamics (PD) is the study of the biochemical and physiological effects of 
drugs. The construction of optimal designs for dose-ranging trials with multiple periods is 
considered in this chapter, where the outcome of the trial (the effect of the drug) is considered 
to be a binary response: the success or failure of a drug to bring about a particular change in 
the patient after a given amount of time. The carryover effect of each dose from one period to 
the next is assumed to be proportional to the direct effect. It is shown for a logistic regression 
model that the efficiency of optimal parallel (single-period) or crossover (two-period) design 
is substantially greater than a balanced design. The optimal designs are also shown to be 
robust to misspecification of the value of the parameters. Finally, the parallel and crossover 
designs are combined to provide the experimenter with greater flexibility. 
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6.1 Introduction 

The drug development process has seen a number of changes over the past 30 years. Many 
of these changes that are pertinent to the design of studies have been oriented to the de- 
velopment of more efficient design processes. From a pharmacological perspective, a ma- 
jor improvement in efficiency has been afforded by the integration of advanced statistical 
methodology into pharmacokinetic studies (Aarons, 1999). Traditionally population phar- 
macokinetic studies were based on intensive sampling strategies, where the same number of 
blood samples (often greater than 12) were taken from each patient. Since the introduction 
of non-linear mixed effects modelling in the early 1980s (see for example Sheiner and Beal 
(1980)), unbalanced, 'sparse' designs have become common practice in population pharma- 
cokinetic studies. For example, the number and timing of blood samples can vary from 
subject to subject, with some subjects providing fewer samples than parameters to estimate. 
In contrast to the flexibility now afforded to population pharmacokinetic (PK) studies, the 
design of crossover trials with multiple treatment periods and treatment sequences are typ- 
ically limited to balanced designs, where all individuals receive all treatments and block 
sequences are designed so that all treatments follow all other treatments at some stage. This 
produces a design which is balanced with respect to carryover effects. Recent research into 
the design of crossover trials for PK studies can be found in Jones and Wang (1998), Jones 
et al. (1999) and Jones and Wang (1999). They show, for two treatments only, that balanced 
designs are optimal for trials that have no carryover effect. This work is theoretical, with no 
presentation of practical application. It is likely that sparse period and unbalanced designs 
will provide an efficient means in which to test the efficacy of a drug without the need for 
time consuming and expensive balanced designs. The aim of this paper is to assess different 
approaches to optimising treatment allocation in parallel and crossover designed studies. 

This chapter first introduces the model for a hypothetical drug with a binary response, 
and gives the information matrix for a crossover design with two periods (which can be 
simplified for the single period case). Optimal and balanced parallel designs (where each 
subject receives only one dose) are then compared in terms of their efficiency. Another 
comparison of optimal and balanced allocations is given, this time for crossover designs, 
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where two doses are given to each subject. This is considered for two cases: where the 
amount of carryover effect from one period to the next is known, and again when it is 
unknown. The sensitivity of the efficiency of these designs to parameter misspecification is 
then assessed. Finally, composites of both optimal parallel and crossover designs are given, 
and their efficiencies are evaluated. 

6.2 Model 

Consider a hctive drug, which elicits an all or nothing response: success or failure. For 
the purposes of this study the hctive drug is based on the triptan like 5-HTid agonists 
for the treatment of migraine (see for example Nestorov et al. (2001)). In these scenarios 
it is assumed that the patients' response can be described by a binary outcome defined by 
treatment success (1) or failure (0) corresponding to alleviation or persistence of the migraine 
headache at some predefined time after the dose, respectively. We have assumed that the 
dose levels may be allowed to vary between and 20 units, each with a hxed probability of 
producing a success, dependent on the size of the dose(s) given. We aim to hnd an optimum 
allocation of dosage sequences (with a maximum of 2 doses per subject), and compare it 
to a balanced allocation. The variation in the data arising from a crossover trial is usually 
explained by an analysis-of- variance model such as 

Yki = ft + 0d[k,l\ + ^d[k,i-i] + Pi + $k + tki, 

where Yjy is the response of subject k in period I to dose (treatment) d[k, I]. 6d[k,i] is the effect 
of dose d[k, I], Xd\k,i-i] is the carryover effect of the dose given in period I — 1, pi is the effect 
of period I, Sk is the effect of the fcth subject, (3 is the overall mean and e^ is the error term. 
It can be convenient to consider the carryover effect of a dose as directly proportional to 
its direct effect (Kempton et al., 2001), in which case \d[k,i-i] would be written as a0d[k,i-i}- 
Another term that may be included in such a model is the interaction between subject and 
period effects. 

In the example used in this chapter, the subject and period effects are ignored. A 
number of references (including Robinson and Jewell (1991) and Yano et al. (2001)) support 
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this assumption by showing that the inclusion of subject effects do not necessarily yield more 
precise estimates of the parameters of primary interest in logistic regression. However, the 
inclusion of random subject effects in this type of model is the focus of the following chapter, 
where mixed effects models are considered. 

The response y^ of the jth patient to the z'th dose is modelled as a logistic regression 
model with p = 3 parameters, 

logit{7r i: ,-} = logit{P(y i:J = 1)} = 6 + Mij + «M(i-i)i (6.1) 

* = 1,2, j = l,...,N, 

where for i > 1, dij is the size of the z'th dose given to the jth patient, and doj = 0. In this 
model, the residual effect of a particular dose is considered to be directly proportional to 
its direct effect (see Kempton et al. (2001)). If a, the level of proportional residual effect, 
is assumed to be known, the vector of model parameters is 6 = (6 , 0{)' . In this case, the 
model is a generalised linear model (GLM), where the predictor (the right hand side of 
Equation (6.1)) is linear in the model parameters. This is assumed in Section 6.4.1, whereas 
in Section 6.4.2 a is considered to be unknown, and so = (6 , 0i, a)', in which case the right 
hand side of Equation (6.1) becomes nonlinear in the parameters, and is no longer strictly a 
GLM. In either case, the probabilities are given by 

-Kij = logit _1 {6>o + 6idij + aM(;-i)j} 
exp{6> + 6\dij + a0id(i-i)j} 
1 + exp{6> + 6\dij + a6id(i_i)j} 

Although the carry over effect can be assumed to be negligible for pharmacokinetic studies 
if a sufficient washout period is included (as per Senn and Ezzet (1999)), the same assumption 
is not necessarily valid for pharmacodynamic studies. 

The response of the jth subject after a single dose (with no previous doses) may be 
simplified to 

l0git{7Ti i } = log (y~^J =6 ° + Mli ' (6 - 2) 

where similarly irij is the probability of success of dose d\j alone, given by 

Try = logit" 1 !^ + Mij} 
exp{6> + Oidij} 



1 + exp{6> + Oidij} ' 
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For an n-point continuous design (as described in Section 2.1), the log-likelihood function 
for the responses in the zth period (i = 1, 2) is given by 

n 

£{ir it y t ) = £ ™ 3 log K y (1 " ^y- yij } , 

3=1 

where w j are the design weights, 7T; = (tt^, . . . , n in )' and r/; = (y iU ..., y in )'. 



6.3 Parallel designs 

Optimal and balanced parallel designs 

Parallel designs are those with only one dose per subject and so the model is simply the 
logistic model given in Equation (6.2), as there are only single doses. The support point of 
the jih elementary design is hence £ • = d\j. We have a standard logistic regression model 
with 6 = (6q, 8i)', and so the information matrix is as given in Equation (2.7): 



M 1 (0,t) = X[W 1 X 1 



(6.3) 



where 



Xi = 



1 d n 



1 d lr 



and W\ = diag{«;j7r 1: ,-(l — 7r l3 -)}. 

The true values of 6 and 6i are unknown, but are necessary for the calculation of Mi. 
The values of these parameters may be chosen (estimated) from past experience or based 
on expert opinion. For the purposes of this investigation select 8q = — 1 and 9\ = 0.3, i.e. 
use the parameter vector 0° = (—1, 0.3)'. These parameter values yield responses similar to 
other triptans where a positive response is seen in approximately 30% of patients for a zero 
dose, and approximately 70% of patients respond positively to the maximum dose (20 units 
in this case) (Nestorov et al., 2001). Consideration is later given to the sensitivity of the 
optimal design to this choice of parameter values. 
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The optimal parallel design £* p ('p' to denote 'parallel') was found using the simulated 
annealing algorithm described in Section 2.4.1: 

fo.00 9.32 

[ 0.5 0.5 

The most likely practical application of this optimal design is to assign half of the patients 
to a group receiving the zero dose, and the remaining patients to a group receiving 10 units 
of the dose. This deviation from the optimal design is still 99.5% as efficient as the original 
design. This optimal design may be compared to a balanced parallel design: 

| 5 10 20 

topi _ ) 

[o.25 0.25 0.25 0.25 
This balanced design has an efficiency of 0.7571 compared to the optimal design. This 
suggests that about 32% more subjects would be required for the balanced design to provide 
the same information as the optimal design. An alternative balanced design with four dose 
levels may be defined with a linear dose range: 

! 5 10 15 

tb P 2 _ ) 

[o.25 0.25 0.25 0.25 
This design has an efficiency of 0.8367, meaning that approximately 20% more subjects would 
be required to match the information provided by the optimal design. A further alternative 
still could be the balanced parallel design with hve dose levels: 

f 5 10 15 20 

tbp3 _ ) 

[o.2 0.2 0.2 0.2 0.2 
This design with more support points actually has a lower efficiency (0.7217) than the 4-point 
alternatives. 

6.4 Cross-over designs 

6.4.1 Optimal and balanced crossover designs, a known 

Now consider crossover designs as an alternative to parallel designs, where each subject 
receives two doses, and the carryover effect a is assumed to be known. This case is quite 



6.4 Cross-over designs 



103 



artificial and is included as a comparator for the parallel design, where the carryover effect 
is of no consequence. The elementary design for the j support point is now £ ■ = (dij, d 2 j)'. 
As discussed in Section 6.2, when a is treated as a known fixed constant, the model is the 
standard GLM with the logit link function. The information matrix is therefore given by 

M(0,O = M 1 (O,O + M 2 (0,O, 

where M\ is the information matrix for the first period responses, given in Equation (6.3). 
M 2 is the information matrix for the second period, given by 



M 2 (e,o = x 2 'w 2 x 2 



(6.4) 



where 



X 2 



1 d 2 i + adu 



ad\ 7 



J- d 2n 
and W 2 = diag{w;7r 2 ;(l - 7r 2i )}. 

Since M is again dependent on 6 = (8o,6i)', as well as a, it is necessary to specify or 
know their values in order to evaluate the optimal design. The values of 8q and 6\ are again 
set at —1 and 0.3, respectively, to give a parameter vector of 0° = (—1, 0.3)', and we choose 
a proportional carryover factor of a = 0.25. The degree of sensitivity of the optimal design 
to these parameter values is investigated later. 

The optimal design in this case, £* cfc , where 'ck' denotes 'cross-over with a known', is 

r * = ,(0-00,9.32) 



where the ordered pair in the first row, as for all crossover designs presented from here on, 
represents the pair of dose levels (d\j,d 2 j), and the second row contains the design weights. 
That is, in this case the design requires that all subjects receive a unit dose (placebo) in the 
first period, followed by a 9.32 unit dose in the second period. A likely practical application 
of this design would require that all patients receive units followed by 10 units of the drug. 
This design is 99.5% as efficient as the optimal design. 
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Table 6.1: First balanced crossover design with four dose levels, ^ bcl . The support points 
of the design are given by the combinations of the levels of the first and second doses, the 
design weights are given by the corresponding table entries. 

Second dose 
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Table 6.2: Second balanced crossover design with four dose levels, £ 6c2 . The support points 
of the design are given by the combinations of the levels of the first and second doses, the 
design weights are given by the corresponding table entries. 

Second dose 
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X: 
O 
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This design is compared to two potential balanced crossover designs with four dose levels, 
as shown in Tables 6.1 and 6.2. The hrst balanced crossover design, £ fccl , with four potential 
dose levels 0, 5, 10 and 20 units, has an efficiency of 0.7087 compared to the optimal design 
(* ck . This balanced design would require approximately 41% more subjects to gain the same 
information as the optimal design. The other balanced crossover design, ^ bc2 , consists of 
dose levels 0, 5, 10 and 15 units, and has an efficiency of 0.7819. This means that 28% more 
subjects would be required to match the optimal design in terms of information gained from 
the study. 

6.4.2 Optimal and balanced crossover designs, a unknown 

Here it is assumed that a is now an unknown parameter (a more likely situation), that is the 
carryover effect, usually a nuisance parameter, is now included in the design process. The 
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information matrix again consists of the sum of the information from both periods: 

but the matrices M\ and M 2 now each incorporate a row and column corresponding to the 
new parameter a. The situation is now complicated slightly, as the model for the second 
period response is no longer strictly a GLM since the predictor 

V2j = o + 6id 2 j + aOidij 

is not linear in the parameters. 

Consider the log-likelihood function for the responses from period i: 

n 

e(n t , Vi ) = y, uj log {<r (i - *tf) 1-Bw } > 

3=1 

where Wj are the design weights. Taking the second derivative of this function with respect 
to any two parameters 9 U and 6 V yields 



d 2 e.(n h y { 
d6 u d6 v 



E 

3 = 1 



Vij ^^ij Vij 97T iJ d7T i 



■Kij 86 u d6 v irfj d6 u 86 v 



E 

3 = 1 



1 - y tj d 2 ir ij 1 - y i:j d-Ky d 



i\ 



ij 



1 - TTij dd u dd v (1 - TTij) 2 



%J } ^ W U ^ w V 



Now, as E(yij) = tt^ and -^- = 7r^(l - ir^)-^ 1 , we have 



M,(6>,0 = E 



d 2 £(7Ti, yi 



E 

3 = 1 



dr]ij dr\ 



1J L/ l/tj 



Tr ij (l-Tr ij ) = F!W i F i 



where 



Ft 



ae 1 



9rg n 
d0 1 



d0 D 



d0 D 



and Wi = disig{wj7Tij(l — n^)}, as dehned previously. Specihcally, we have 



F-i 



1 d n 

1 d ln 



and F 2 = 



1 d 2 i + adu 6id u 



1 d 2n + ad Xn 6idi r 
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The D-optimal design £* CM below, where 'cu' denotes 'cross-over with a unknown', was 
found by using this information matrix and with parameter vector 0° = (—1, 0.3, 0.25)': 

f (0.00, 0.00) (0.00,9.07) (20.00,0.00) 
| 0.039 0.572 0.389 

A more practical version of this design might be the following: 

r „ = f(0,0) (0,10) (20,0) 
[ 0.05 0.55 0.4 

This design is 99.5% as efficient as £* cu . The optimal design was again compared to the two 
potential balanced crossover designs given in Tables 6.1 and 6.2, as used for the previous case 
of a known. The hrst balanced crossover design, £ bcl has an efficiency of 0.6581 compared 
to this optimal design £* CM . This balanced design would require approximately 52% more 
subjects to gain the same information as the optimal design. The other balanced crossover 
design, £ t bc2 has a slightly lower efficiency of 0.6394, meaning that 56% more subjects would 
be required to match the optimal design in terms of the information gained from the study. 
The higher efficiency and practicality of the optimal design, with only three combinations 
of dose levels, makes it a much more attractive alternative to the experimenter. 

6.5 Sensitivity analysis 

All .D-optimal designs described in this chapter are locally optimal, that is they depend on 
a particular set of parameter values. This section investigates the impact that this has if the 
true parameter values vary from the specified values used to generate the designs. 

A 'joint' sensitivity analysis is considered here, where the impact of varying all parameters 
together is considered, rather than a 'marginal' sensitivity analysis which would involve 
investigating the effects of varying only one parameter at a time. 

For the purpose of this analysis, the dependence of the optimal designs on the parameter 
values is emphasised by writing the design as a function of 0: £,*(0). The original optimal 
designs were calculated using = (— 1, 0.3, 0.25)' (the last element of was of course only 
included for the hnal optimal design), so we call these designs £* p (0°), £* cfe (0°) and £* cu (0°). 
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The sensitivity analysis involves generating a number (500 in this case) of new parameter 
vectors by sampling uniformly between (—1.5,0.1,0.1)' and (—0.5,0.5,0.4)'. This ensures a 
reasonable spread of parameter values, centred on 0°. For each new parameter vector 0i 
(i = 1, . . . , 500), the three optimal designs £* p (0;), £* cfc (0i) and £,* cw (Oi) are generated and 
compared to the original optimal designs by calculating the following efficiencies: 

iM^r^a ))! 172 



Eff? = 
Efff = 



\M(e h ^(0i))\ 1/2 

\M{0 h C ck {0°))\ 1/2 

iM(6>,,e c; w)i 1/2 



FffCM = \M(0 t ,C cu (O°))\ 1/3 . = 

These efficiencies are represented by the histograms in Figure 6.1. It can be seen from the 
histograms that the efficiencies of the original optimal designs rarely fall below 0.7 in each 
case, and are above 0.85 for at least half of the simulated parameter values in each case. The 
optimal design for the case where a is unknown is particularly robust, with an efficiency of 
at 0.85 or greater for a very large number of new parameter values. 

For comparison, the histograms of efficiencies of the corresponding balanced designs are 
also shown. The balanced designs with dose levels 0, 5, 10 and 15 units (£ 6p2 and ( bc2 ) 
were used for a fair comparison, as they were usually the most efficient type of balanced 
design (being only slightly less efficient for the case where a is unknown). The efficiency 
corresponding to the original parameter values (the mid-point of the ranges used in these 
simulations) are also shown by the dotted lines. For each of the three types of designs, the 
balanced design is more likely than not to be less efficient for different parameter values 
than it is for the originally specified parameter values. In contrast to the optimal designs, 
the efficiencies of the balanced designs rarely rise above 0.85, and often fall below 0.7. The 
efficiency of the balanced design when a is unknown is particularly poor. 

6.6 Union designs 

Finally, consider a union of parallel and crossover designs, in which some subjects receive one 
dose and some receive two. Greater information will obviously be gained from the patients 
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Figure 6.1: Histograms of efficiencies for the sensitivity analyses for optimal (left) and balanced 
(right) designs, for (a) & (b) parallel designs; (c) & (d) crossover designs where a is known; 
(e) & (f) crossover designs where a is unknown. The dotted lines on the histograms on the 
right hand side represent the efficiencies of the balanced designs for the original parameter values, 
9= (-1,0.3,0.25)'. 
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receiving two doses, but if the loss of efficiency by including some subjects in a single period 
parallel group is not too great, this will allow for greater flexibility in the clinical trial. The 
more realistic model with a unknown is used for these designs. 

We dehne a union design as a combination of each of the optimal designs for the parallel 
and crossover studies. The proportion of patients allocated to the parallel group is denoted 
by p. Given the optimal designs 

£*P £*p I I *tCIl £* 

p*P J p > and t*<"" ' 



W *P 



pi I -L 



for the parallel and crossover models, respectively, we write the union design as 




£ 



P 



p w T ■ ■■ p w Z 



SI • • • ^>n cu 

(i - P ) w *r ... (i - p)<z 



c ™ . 0.00 9.32 
" 0.25 0.25 



So ^ mon = £*p and Q nion = £* cu - This is similar to the idea of hybrid designs used in 

Chapters 3 and 5. For example, for equal allocation between the two groups (p = 0.5), 

1(0.00,0.00) (0.00,9.07) (20.00,0.00) 
! 0.020 0.286 0.194 

Since the model is the same as in section Section 6.4.2, the information matrix remains 

the same. It can be shown that M(0, £™) = pM{0^* p ) + {l- p)M{0^* cu ). As mentioned 

previously, greater information will be gained by allocating a greater proportion of patients to 

the crossover arm of the trial. That is, the D-optimality criterion of £" mon will be maximised 

for p = 0. Hence we dehne efficiencies of the union designs as 

| M (0^union)|l/3 



Eff(£ 



union > 
P > 



|M(0,£* CM )| 1/3 ' 

Efficiencies will obviously decrease as p approaches 1. However, parallel trials will gen- 
erally be cheaper to conduct, as they require fewer doses and less of the patient's time. In 
this light, efficiencies must be adjusted for the reduction in cost resulting from allocating a 
patient to the parallel trial rather than the crossover trial. In order to do this, the infor- 
mation matrices must be adjusted for the reduction in cost. If C is the reduction in cost 
incurred by allocating a patient to a parallel design, the information gained by patients in 
the parallel group is inflated by the same amount: 

M c (0, C lion ) = PCM(0, C p ) + (1 - P)M(0, e cu ). 
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Figure 6.2: Efficiencies of union designs. 



The efficiencies are adjusted accordingly: 



E$ c (( 



union \ 
P ' 



|M c (6>,^ nion )| 1/3 
|M(0,£* CM )l 1/3 ' 



These adjusted efficiencies are shown in Figure 6.2 for a range of p (from to 1); and for 
C = 1 (no reduction in cost), C = 1.5 (a 50% reduction in cost), and C = 2 (the cost of the 
experiment is directly proportional to the number of doses used). 

The experimenter may be interested in determining the maximum number of patients 
that can be allocated to the parallel group before the adjusted efficiency drops below a 
certain level, say 0.9. For the C = 1 case, no more than 19.4% of the patients can be to the 
parallel group before the efficiency of the design drops below 0.9. For C = 1.5 and C = 2, 
the maximum proportions of patients which can be placed in the parallel arm of the study 
increases to 32.6% and 55.1%, respectively. Depending on the magnitude of the extra cost 
involved with running a crossover trial, a good mixture of patients in the two treatment 
arms can be achieved without a great loss of efficiency. 
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6.7 Discussion 

It has been shown that for a model with proportional carryover effects and binary outcomes, 
an optimal allocation of treatments is significantly more efficient than a balanced allocation, 
whether parallel designs, crossover designs or a combination of both in a union design are 
being considered. Further, it appears in light of the sensitivity analyses performed that the 
optimal designs are quite robust against parameter misspecification, and given that often 
experimenters have a reasonable idea of the prior distribution of the parameters, the results 
here are of practical value. They present experimenters with the potential of savings in terms 
of number of patients in an experiment without any loss in efficiency. 

In addition to savings in terms of number of patients, the work here highlights that 
optimal designs may offer greater flexibility to the experimenter since with the union designs 
not all patients are required to receive both periods of treatments while these designs retain 
the advantages of the crossover structure. It is assumed in this setting that allocation to the 
parallel and crossover groups are randomised. 

Although these findings are currently of theoretical interest only, at this stage, they do 
have significant potential for application in early clinical studies where significant PK data 
and some PD data may already be available and optimisation of first or second use in patients 
could be optimised further. The final design may well be the optimal design (perhaps even a 
union design) augmented with additional design points to ensure that the important clinical 
questions can be answered. 

In the next chapter the effect of incorporating random coefficients in the models, to allow 
the model parameters to vary randomly between patients, is considered. 
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Chapter 7 

Optimal designs for generalised linear 
mixed models 



The GLMs in the previous two chapters have all been hxed effects models, i.e. all of the 
parameters are assumed to be fixed for all observations. In some circumstances we may wish 
to allow the parameters to vary between 'blocks' or 'clusters', and assume that observations 
within blocks (eg. subjects) are correlated. This is referred to as a mixed effects model. An 
example of a nonlinear mixed effects model is employed in Chapter 4, where the pharma- 
cokinetic parameters such as clearance and volume are allowed to vary between patients, a 
much more realistic assumption than a single fixed clearance and volume for each patient. 

If some parameters in a GLM are assumed to vary randomly between blocks, the model is 
known as a generalised linear mixed model, or GLMM. This chapter will focus on D-optimal 
design for GLMMs, in particular it is concerned with logistic regression models with random 
coefficients. The calculation of the information matrix presents a significant problem for 
GLMMs. The log-likelihood cannot be written down in closed form, hence either numerical 
methods (such as integration by quadrature) or approximations to the log-likelihood (such 
as the one used in Longford (1994)) must be used. Unfortunately, numerical methods can be 
very computationally intensive, even for very simple models and designs, and approximations 
can lead to large inaccuracies. 

This chapter introduces an approximation to the information matrix which is not as 
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elegant as that of Longford (1994) (and hence has longer computation times), but is a more 
general approach and gives a significant increase in accuracy. Both approximations are then 
applied to the pharmacodynamic model used in the previous chapter, and the efficiencies of 
the resulting designs are compared. 

7.1 Information matrix 

Consider the logistic model with between- cluster variation for y^-, the z'th of rij responses 
from the jth block (eg. subject), in Equation (2) of Longford (1994), 

logit{P(^ = 1^-)} = Xijfi + Zi^Sj, i = l,..., nj , j = l,...,N, (7.1) 

where the z t j are subsets of the explanatory variables as^ and elements of the Sj are in- 
dependent standard normal random variables. Let Xj = (x{-, . . . , «e£ ■)' and similarly let 
Zj = (z{j, . . . , z^.)'. A distinction is made between the fixed parameters (3 = (Pi, . . . , P p )' 
and the random parameters A = (Ai, . . . , X q )' which are the unique elements of S. The 
entire vector of parameters is then written as \I/ = (/3', A')'. 

The log-likelihood function for the jth block (with response vector yj = (yij, . . . , y n j)') 
is written 

1; = £(¥, Vj ) = bg J - r - j Ptf^QJdSj 



where 



»=i 



and $ q is the g-dimensional standard normal probability density function. 

For an iV-point exact design with response vector y = (y[, . . . , y' N )' , the overall log- 
likelihood is 

N 
3 = 1 

and is given by 

n 

* = *(*, y) = $>**(*, «,-), 

3 = 1 



7.1 Information matrix 
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for an n-point approximate design, where Wj are the design weights. 

Recall that the Fisher information matrix for the jth block, a function of the parameters 
\1/ and of the elementary design £ -, is defined as 



M F (¥,£,) = 



,'dL dL 

E ' J 3 



= -E 



( d 2 L 



a 2 e j a^ij 



= -E 



9/39/3' df3d\' 

a 2 ^ a 2 e j 

8X0/3' d\dX 



q B 3 



(7.2) 
(7.3) 



(7.4) 



7.1.1 Numerical methods 

For an exact design, the expectation in Equation (7.3) may be calculated by the following 
sum: 



E 






Vij 






where the sum is over the 2^ nj possible outcomes for yj, and the derivatives are evaluated 
numerically, using finite difference methods (Press et al., 2002). The log- likelihood may be 
calculated by either of two ways: by adaptive Lobatto quadrature (Gander and Gautschi, 
2000) via the quadl function in MATLAB when q = 1, in which case the information 
matrix (assumed to be an accurate representation of the 'true' information matrix) is denoted 
by iWjv(Vl/, £); or by the same approximation to the log-likelihood used in the next two 
methods, in which case the information matrix is denoted by 7Vfjva(^ , ) 0- ^ i s assumed that 
Mjy( 1 5 r , £) is an accurate representation of the 'true' information matrix, as it contains no 
approximations, and any numerical error associated with the differentiation and integration 
is assumed to be negligible. 

These numerical methods are only applicable to exact designs, as they require summation 
over possible outcomes for yj at each support point. This is not defined for an approximate 
design. 
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7.1.2 Longford approximation 

It can be shown that the log-likelihood can be approximated by the following second-order 
Taylor expansion: 



N 
' 0. 



E 

3 = 1 

N 

E 

3 = 1 



log{P,-(0)} + ^1,-6,--^ log |G,- 



where 



G 3 = I q + Y} t2 Z' j W j Z j Y> t ' 1 , 
Wj = diag{wj}, 



w 



■j 



= (wij, ..., w nj )', and 



W {j = 7T ii (0)(l-7T ii (0)). 

Using this approximate log-likelihood, Longford (1994) hnds the expectation of its deriva- 
tives to give an approximation to Aj, Bj and Cj in Equation (7.4) by ignoring the dependence 
of w^ = 7Tjj(l — 7Tjj) on f3: 

A 3 = X' 3 V-'X 3 , 

where Vj = W 3 ~ l + ZjYiZ'j\ the (u, v) element of Bj is 



3 2 V 3 da 2 u 3 dol 

and Cj = 0. The information matrix calculated by this method is denoted by Mt($,J). 

Some investigation (presented later in this chapter) using the numerical methods outlined 

in Section 7.1.1 have shown that such an assumption may lead to significant error. In the 

next section, a more accurate approximation to Equation (7.4) is given by acknowledging 

the dependence of wij on (3. 

7.1.3 Approximation acknowledging dependence of Wy on (3 

The approximation to the information matrix presented in this section is denoted by Mw(^, £)> 
where it is hoped that the W in the subscript will remind readers of the dependence of w i3 
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(elements of the Wj matrix) on (3. It is hoped that acknowledging this dependence will 
produce a more accurate approximation to the information matrix. In particular, it will 
account for interaction of the hxed and random effects by finding a nonzero matrix Cj in 
Equation (7.4). 

For any given parameters ip m and ip n , 



d 2 L 



9 2 log{P,(0)} , 1 d 2 e' J H J e J 1 9 2 log|G, 



di/j m dip r , 



dip m dip n 2 di\) m di\) n ^ 2 ^ dip m dip r , 

Term 2 Term 3 



Term 1 

Each of these three terms in turn are considered in turn. 



Term 1 



Firstly, Pj(0) = Y\a=i ^tj (1 ~~ ^ijY Vij does not involve any of the random parameters, 



so 



for any m = 1, . . . , q, 



91og{Pj(0)} 



= 0. 



So clearly 



u A rn ij A n 

dXmdfin 



= 0, m, n = 1, . . . , q, and 
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Now consider one of the fixed parameters, (i m : 
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Now, 










OPr, 



= X 



Ij 1 1 1 



1 + exp{-Xij(3} 
exp{-Xijf3} 



(l + expj-^/3}) 2 
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1\ 



ijj 



So 



•^ijm m ij 
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df3 m 
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%j ^ ijm ^ ijm"ij \ 



Taking the second derivative, 



where -Xj( m ) is the mth column of Xj. 



Term 2 



For any parameter ip r 
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(7.5) 



(7.6) 
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The last step is true because 



3 3 d^ m 



3 J dip m 



(scalar) 



dip n 
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—H jej 
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(Hj symmetric) 



The second derivative: 



Now, 
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And, similarly to Equations (7.5)-(7.6), 
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Fixed parameters only 
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We have 



d 
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where T ljm = diag(xi im (l - 2ir 1:j ), ..., x n jm (l - 2?r n j)). So it follows that 
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Considering each term of Equation (7.7), 
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® Hj ^^,1/2^-1^,1/2^/ 9 Wj 
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Random parameters only 

Since e 3 does not depend on any random parameters, Equation (7.7) reduces to simply 
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One fixed parameter, one random parameter 

Equation (7.7) simplifies to 
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dp m d\ n y J dX n dX n J J dp m 
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Derivatives of X 1 ' 2 

It is assumed that X = diag(Ai, . . . , X q ), so X 1 ' 2 = diag(A/ , . . . , X q ). So we have 
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d 2 log | Gj 

dip m dip n 



d 

dlpr. 



= tr [ G 



tr G 



_ x dGi 

dip m 

- l J^G^_ G _ 1 dG lG _ 1 dG 1 
di/j m dip n J dtp n J dtp m 



The first and second derivatives of Gj are given in Equations (7.9)-(7.10). 

Expectation 

Since neither of Terms 1 and 3 contain random variables, it only remains to hnd the expec- 
tation of Term 2. If je is a vector of random variables with mean m and covariance matrix 
S , we have 
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Now 

E(e,-) = E[Wr\ yj - Wj .)] = Wr\E( yj ) - tt,) 
Cov(e,-) = Cov[W7 1 (y i -7r i )]= W^Covfo,-) WT 1 

where 7Tj = (ir^, ..., TT njj )'. 

E( yij ) = E^|tf,-)] = E[T S («)] 

= J mj nij ^ j) j2^ eXV{ ~2 5 ' j6j}ddi (7 ' U) 

The (z, fc)th element of Cov(r/j) is given by Eiy^ykj) — E{y i j)E(ykj). If z = A;, 

ViyijVkj) = E(yJ) = E(y ii ), 
otherwise 

^(yijVkj) = E[E(y ij y kj \d j )]=E[7T ij (8)7T kj (6)] 

= J - R ; J Kijidj^kjid) /2 exp{--6' j 5 j }dd j . (7.12) 

The integrals in Equations (7.11) and (7.12) may be calculated by quadrature (again, using 
the quadl function in MATLAB when q = 1). 

7.2 Example: Dose-ranging trial 

In this section, the problem of optimal design for a dose-ranging trial as described in Chapter 
6 is considered. In this case, however, some of the parameters are allowed to vary randomly 
between subjects. The GLMM and its corresponding optimal designs are described below. 

7.2.1 Model 

Suppose that N individuals are given 2 doses (in sequence) of a drug which elicits a binary 
response (eg. success = 1, failure = 0). It is assumed that the effect of each dose carries 
over into the next period. The model is simplified slightly from the previous chapter, and 
the carryover effect of a dose is no longer assumed to be proportional to its direct effect. 
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Conditional on the j th individual's parameters, the probability of success of its i th dose can 
be modelled by the logistic regression model with p = 3 parameters, 

logit{P(y ii = l)\0j} = jO + Ojidij + 9 j2 d {i _ 1)j 

i = l,2, j = l,...,N, 
where 

and where for i > 1, dij is the amount of the i th dose given to the j th patient, and doj = 0. 
6jo is referred to as the intercept parameter, 6j\ represents the direct effect of the dose, and 
6j2 represents the effect of the dose carried over into the following period (the carryover 
effect). The vectors of parameters Oj are assumed to be independent random samples from 
a normal distribution, 

or alternatively Oj = (3 + bj where 

bj ~ N(0,S). 

This model assumes that every parameter varies randomly between subjects. A more 
general model may be formed by allowing only q of the p parameters to be random, and by 
treating the remaining parameters as fixed: 

logit{P( yij = l)\0j} = XijBj, 

i = l,2, j = l,...,N, 

where bj ~ N(0, S) is now a column vector of length q < p, and the Zij (row vectors 
of length q) are the relevant subsets of the vectors Xij. This is equivalent to the model in 
Equation (7.1). Only the case with a single random parameter at a time is considered (q = 1) 
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in order to simplify calculation. A larger q will increase computation times considerably, 
due to the need for numerical evaluation of double or triple integrals. In practice, the 
intercept parameter is usually the only parameter assumed to vary between subjects in a 
pharmacodynamic model such as this (Yano et al., 2001). Less frequently, the direct effect 
parameter is assumed to be random. Due to the single random parameter (q = 1) restriction 
imposed by computational constraints, only 6j and 6ji will be considered random, as a 
random carryover effect would be highly unlikely without a random direct effect also in the 
model. 

7.2.2 Optimal design 

Dehne an elementary design for the j th individual by the vector of doses £ ■ = (d\j, (%)'. Both 
exact and approximate designs are considered, and are evaluated by using the D-optimality 
criterion, based on (a) Longford's approximation, .M£,(vl/, £); (b) the more accurate approx- 
imation acknowledging the dependence of Wij on (3, Mw(^f, £)> ( c ) the numerical method 
Mjv„($, £) based on the approximate log-likelihood; and (d) the numerical method involving 
numerical differentiation and integration, but no algebraic approximation, M^(^, £). It is 
expected that methods (b) and (c) will produce very similar D-optimality criteria. In fact, 
(c) will only be used as to test that the complex criterion (b) is coded correctly. 

As the numerical method .Mjv(\l/, £) is extremely computationally intensive (a single 
calculation of the information matrix for a very small design takes several minutes on a 2.4 
GHz Pentium 4 PC), it will not be used for optimisation. Instead, it will only be used to 
compare efficiencies of designs generated by the hrst two methods, as it is assumed to be 
the most accurate representation of the information matrix available to us. As described 
in Section 7.1.1, this numerical method is only defined for exact designs, so even though 
both exact and approximate designs will be found using the two approximations, only the 
efficiencies of the exact designs will be compared. 

All designs are locally optimum, with parameter means based on those used in the pre- 
vious chapter: 

/3 = E(Gj) = (-1, 0.3, 0.25 x 0.3)' = (-1, 0.3, 0.075)'. 
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7.2.3 Results 

Before any optimal designs were found, a quick comparison of the four methods was per- 
formed. A 6-point exact design was chosen at random, and the D-optimality criterion was 
calculated using the four different information matrices. The intercept and direct effect pa- 
rameters were assumed to be random, one at a time, and a range of variances of the random 
effects was used. Figure 7.1 shows the results. It can be seen that the 'less approximate' 
approximation to the information matrix (Mw) gives almost exactly the same results as the 
more time-consuming numerical method using the approximate log-likelihood (M^ a ) as ex- 
pected. Both of these are significantly closer to the 'gold standard' numerical method (Mjv) 
than Longford's approximation (Ml), especially for larger variances. However, all three 
methods involving the approximation to the log-likelihood still seem to be fairly inaccurate 
when compared to M^, all over-estimating the criterion, and hence under-estimating the 
standard errors of parameter estimates on average. In light of this, the efficiencies of the 
exact optimal designs found with the computationally feasible algebraic methods (Ml and 
Mw) will be compared in terms of the most accurate values of the criterion, calculated with 
M N . 

Exact designs 

Optimal exact designs are given for the two methods Ml and Mw in Tables 7.1 and 7.2, 
along with their criteria calculated by three different methods: Ml, Mw, and -Mat. When 
comparing two designs, the values of \M^f(^f, ^)| 1 /(p+«) are compared since it is considered 
the closest to the true value of the D-optimality criterion. The ratio of these criteria give 
the efficiency of one design compared to another. Only 6-point designs are considered here 
due to the long computation times for some methods. 

These optimal designs are compared to the optimal 6-point exact design for the fixed 
effects model, which has 4 points with di = 0, d 2 = 9.11, and 2 points with di = 20, d 2 = 0. 

Table 7.1 shows D-optimal 6-point exact designs where the intercept parameter (6jo) 
varies randomly between subjects, and the other two parameters are hxed. For a 10% 
coefficient of variation (CV), the more accurate method (Mw) actually produces a design 
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FIGURE 7.1: Comparison of different methods of calculation of the information matrix, for (a) 
9jo random, (b) 9j\ random. 



which is 98.4% as efficient as Longford's method. The optimal exact design on 6 points for 
the fixed effects model is 93.2% as efficient as Longford's design. Similar patterns are seen as 
the CV is increased to 30% and 50%, with Longford's method producing the most efficient 
designs, followed by the more precise method, and with the fixed effects design being the 
least efficient. It is interesting to note that in this case the optimal design for the fixed effects 
model is always more than 90% as efficient as the most efficient method which incorporates 
the random effects, regardless of the size of the CV. 

Table 7.2 shows the same information as in the previous table for the case where the direct 
effect parameter (6j\) varies randomly between subjects, and the other two parameters are 
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fixed. When the CV is 10%, the hxed effects design is actually about 4% more efficient 
than the most efficient design incorporating the random coefficient. These results are rather 
unexpected. The inclusion of the random coefficient in the design process is a substantially 
complex problem, but the exceptionally simpler information matrix for the hxed effects 
model produces a more efficient design in some cases. The large amount of error due to the 
Taylor expansion of the log-likelihood, as shown in Figure 7.1, may have a detrimental effect 
on the quality of designs found using this approximation. The hxed effects design does not 
perform as well for the larger values of CV, but is always more efficient than the designs 
found using the more precise approximation to the information matrix. 

To check that these results are not specific to these parameter values, the designs are 
regenerated for a new set of parameter values, with larger direct and carryover effects: 

(3 = E(0j) = (-1, 0.5, 0.25 x 0.5)' = (-1, 0.5, 0.125)'. 

The coefficient of variation is hxed at 30%. The results are shown in Table 7.3. The optimal 
6-point exact hxed design for these parameters has 4 points at d\ = 0, d 2 = 5.58, and 2 
points at di = 20, d 2 = 0. 

When comparing the designs found using the two methods incorporating the random 
coefficients, it can be seen that the designs found using the more precise approximation 
again produces slightly less efficient designs. The hxed effects design also produces the most 
efficient design in one of the two cases. 

Approximate designs 

Optimal approximate designs are given in Tables 7.4 and 7.5 and in Figures 7.2 and 7.3. 
The approximate hxed effects design from the previous chapter is also given as a plot in 
Figure 7.4. As for the plots of the approximate designs in Chapter 5, each support point is 
represented by its position in the xy-plane, and its corresponding weight is proportional to 
the size (area) of the plotted point. 

From Figure 7.2 it can be seen that the size of the CV makes very little difference to 
the positions and weights of the support points when the intercept parameter is random. 
Although for the Ml method, the smaller design point actually disappears altogether when 
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Table 7.1: Exact optimal designs when the intercept parameter (6jq) is random 





Optimisation 
Method 


d\ 


d 2 




Criterion 




cv 


\M L 1 /(p+?) 


|M w | 1/(p+9) 


\M N \ l ' { - p+ ^ 


10% 


M L 


0.00 
0.00 
0.00 
0.00 
20.00 
20.00 


6.67 
6.67 
6.67 
6.67 
0.00 
0.00 


6.9148 




5.6228 




M w 


0.00 
0.00 
0.00 
8.45 
20.00 
20.00 


5.08 

5.08 
5.08 
0.00 
0.00 
0.00 




6.5103 


5.5333 




Fixed effects 


design 




- 


- 


5.2431 


30% 


M L 


0.00 

0.00 

0.00 

0.00 

20.00 

20.00 


6.76 
6.76 
6.76 
6.76 
0.00 
0.00 


6.7324 




5.3599 




M w 


0.00 
0.00 
0.00 
9.06 
20.00 
20.00 


5.06 
5.06 
5.06 
0.00 
0.00 
0.00 




6.2209 


5.2577 




Fixed effects 


design 




- 


- 


5.0483 


50% 


M L 


0.00 

0.00 

0.00 

0.00 

20.00 

20.00 


6.91 
6.91 
6.91 
6.91 

0.00 
0.00 


6.3981 




4.9049 




M w 


0.00 

0.00 

0.00 

10.20 

20.00 

20.00 


5.08 

5.08 
5.08 
0.00 
0.00 
0.00 




5.6863 


4.7865 




Fixed effects 


design 




- 


- 


4.6927 
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Table 7.2: Exact optimal designs when the direct effect parameter (6ji) is random 





Optimisation 
Method 


d\ 


d 2 




Criterion 




cv 


\M L 1 /(p+<?) 


\M W \ 1 ^ P+ ^ 


|Afjv| 1/(p+9) 


10% 


M L 


0.00 
0.00 
8.82 
8.82 
8.82 
20.00 


0.00 
9.08 
7.89 
7.89 
7.89 
0.00 


37.8542 




34.9416 




M w 


0.00 
6.37 
6.37 
6.37 
6.37 
20.00 


0.00 
6.04 
6.04 
6.04 
6.04 
0.00 




27.0181 


32.7066 




Fixed effects 


design 




- 


- 


36.3835 


30% 


M L 


0.00 

0.00 

8.51 

8.51 

20.00 

20.00 


0.00 
9.10 
8.33 
8.33 
0.00 
0.00 


35.1658 




42.8165 




M w 


0.00 
5.66 
5.66 
5.66 
5.66 
20.00 


0.00 
5.39 
5.39 
5.39 
5.39 
0.00 




23.1245 


31.2136 




Fixed effects 


design 




- 


- 


38.5235 


50% 


M L 


0.00 

0.00 

0.00 

8.60 

20.00 

20.00 


9.17 
9.17 
9.17 

8.27 
0.00 
0.00 


31.4545 




28.5358 




M w 


0.00 
5.28 
5.28 
5.70 
5.70 
5.72 


4.09 
20.00 
20.00 
0.00 
0.00 
3.59 




17.5568 


16.2869 




Fixed effects 


design 




- 


- 


26.8325 
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(a) 



(b) 



(c) 
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FIGURE 7.2: Approximate optimal designs when the intercept parameter (Ojo) is random, using 
method M L with (a) CV = 10%, (b) CV = 30%, (c) CV = 50%; and using method M w with (d) 
CV = 10%, (e) CV = 30%, (f) CV = 50%. 



the CV is increased to 50%. The difference between the designs generated by the Ml and 
Mw methods is minor, and all are fairly similar to the fixed effects design, with the major 
exception of the (0,0) point being moved to a higher first dose for the Mw method. 



The designs given in Figure 7.3 for a random direct effect parameter show a marked 
difference in the two methods of design construction, as well as a more significant effect 
of the size of CV than in the case of a random intercept. All of these designs are also 
significantly different from the fixed effects design. The only support point in common 
between all 7 designs in Figures 7.3 and 7.4 is (20,0). The variability of designs appear to 
reflect the high variability in criterion values of the exact designs for this random parameter 
given in Table Table 7.2. 
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FIGURE 7.3: Approximate optimal designs when the direct effect parameter (0ji) is random, 
using method M L with (a) CV = 10%, (b) CV = 30%, (c) CV = 50%; and using method M w 
with (d) CV = 10%, (e) CV = 30%, (f) CV = 50%. 



CM o 

13 - 




FIGURE 7.4: Approximate optimal design for the fixed effects model. 
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7.3 Discussion 

In this chapter, two methods of computing optimal designs for logistic regression models 
with random coefficients were considered. Both involved the construction of the information 
matrix by using a Taylor expansion of the log-likelihood. One of the methods involved 
further simplifications and assumptions, the other method did not. 

The information matrices calculated by these two methods were compared to a more 
computationally intensive numerical method, assumed to be an accurate representation of 
the true information matrix. It was shown that for these models, the method with no further 
simplifications and assumptions produced more accurate information matrices than the more 
approximate method, as expected. 

Even though the much more complex method, where the only simplification was the use 
of the Taylor expansion, was shown to give information matrices significantly closer to the 
true values, the optimal designs based on these methods were found in all cases to be slightly 
less efficient than designs found using the much simpler, quicker method. 

In fact, the use of the simpler fixed effects model produced designs which were in most 
cases almost as efficient as the designs incorporating the random effects. When the direct 
effect parameter in the model is allowed to vary between subjects, the fixed effects design is 
actually sometimes slightly more efficient than the designs found using the more sophisticated 
methods. In light of these results, it may not be worth incorporating random coefficients 
into the design process at all until more accurate methods of constructing the information 
matrix are developed. 

The reason for the unexpected low efficiencies for the design optimisation methods incor- 
porating random effects is not immediately apparent, and may be a topic of future research 
in this area. 

However, rather than looking for a more accurate information matrix for these models, 
it may be more beneficial to look to alternative models for this data which simplifies the 
algebra. Two such alternatives currently under consideration are: the probit link function 
for binary data, which produces a very similar response curve to the logit link function used 
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here, but which appears to be somewhat simpler to work with algebraically; and the beta- 
binomial model which, after some preliminary investigation, may have the same advantage. 
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Table 7.3: Exact optimal designs for a different set of parameter values, with CV = 30%. 



Random 
Coefficient 



Optimisation 
Method 



Criterion 



d\ 



do 



\M L \ 1/{p+q) \Mw\ 1/ip+g) 



|Mv| 1/(p+9) 







jo 



Mr 



(intercept) 



0.00 
0.00 
0.00 
5.16 
5.16 
20.00 



4.32 
4.32 
0.77 
0.00 
0.00 
0.00 



4.3425 



M 



w 



0.00 
0.00 
3.78 
5.64 
5.64 
20.00 



2.37 
2.37 
0.00 
0.00 
0.00 
0.00 



4.0001 



Fixed effects design 



3.5382 



3.4771 



3.3159 



6ji 

(direct effect) 



Mr 



0.00 
0.00 
0.00 
5.24 
5.24 
20.00 



5.58 
5.58 
5.58 
4.83 
4.83 
0.00 



14.0018 



M 



w 



0.00 
0.00 
3.59 
3.59 
3.59 
20.00 



0.00 
3.30 
3.49 
3.49 
3.49 
0.00 



10.0803 



Fixed effects design 



14.7351 



14.1752 



18.1463 
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Table 7.4: Approximate optimal designs when the intercept parameter (6jo) is random 





Optimisation 
Method 


d\ 


d 2 


w 


Criterion 


cv 


\M L \ l K p+c f> |MhH 1/(p+,) 


10% 


M L 


0.00 


0.00 


0.0312 


1.1529 






0.00 


6.75 


0.6327 








20.00 


0.00 


0.3361 






M w 


0.00 


5.39 


0.5723 


1.0862 






9.05 


0.00 


0.0995 








20.00 


0.00 


0.3282 




30% 


M L 


0.00 


0.0 


0.0250 


1.1224 






0.00 


6.82 


0.6382 








20.00 


0.00 


0.3368 






M w 


0.00 


5.41 


0.5719 


1.0380 






9.74 


0.00 


0.0995 








20.00 


0.00 


0.3286 




50% 


M L 


0.00 


6.91 


0.6624 


1.0664 






20.00 


0.00 


0.3376 






M w 


0.00 


5.37 


0.5548 


0.9487 






10.87 


0.00 


0.1227 








20.00 


0.00 


0.3226 
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Table 7.5: Approximate optimal designs when the direct effect parameter (6ji) is random 





Optimisation 
Method 


di 


d 2 


w 


Criterion 


cv 


\M L \ l '( p+ ^ \M w \ l l { P +q ) 


10% 


M L 


0.00 


0.00 


0.2350 


6.4381 






8.52 


8.28 


0.5098 








20.00 


0.00 


0.2552 






M w 


0.00 


0.00 


0.2393 


4.5958 






6.48 


6.00 


0.6305 








20.00 


0.00 


0.1302 




30% 


M L 


0.00 


9.14 


0.4719 


5.9436 






8.69 


8.17 


0.2655 








20.00 


0.00 


0.2626 






M w 


0.00 


0.00 


0.2283 


3.9108 






5.75 


5.39 


0.6361 








20.00 


0.00 


0.1356 




50% 


M L 


0.00 


9.18 


0.5237 


5.2801 






8.75 


8.08 


0.2056 








20.00 


0.00 


0.2707 






M w 


0.00 


4.09 


0.1290 


4.1079 






4.14 


3.71 


0.0928 








5.05 


20.00 


0.6618 








20.00 


0.00 


0.1164 
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Conclusion 



This thesis has described a number of recent developments in optimal design of experiments 
for nonlinear and generalised linear models, with applications to real-life and hypothetical 
examples arising in pharmacology and other areas. While the application of these techniques 
has been novel in itself, this thesis has also presented new methods of designing experiments 
with multiple objectives (model discrimination and parameter estimation), as well as designs 
for generalised linear models with random effects. 

Product optimal designs for two competing nonlinear models were shown to be efficient 
for estimation of parameters in both models, but they can be quite poor for discriminat- 
ing between the two models according to the T-optimality criterion, particularly when the 
number of support points is small. The use of both conditional and hybrid designs allow 
product optimal designs to be augmented with additional support points from T-optimal de- 
signs (or points close to the T-optimal designs in the case of conditional designs), which can 
increase the T-efficiencies significantly without having a great impact on the D-efficiencies. 
The converse is seen when starting with a T-optimal design and adding support points from 
the product optimal design. The hybrid designs seem to be very similar to the conditional 
designs in terms of support points/weights and T- and D-efficiencies, so the use of these 
designs would be preferable to the conditional designs, as the construction of the hybrid 
designs requires no further optimisation once the T-optimal and product optimal designs 
have been found. 
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Although the T-optimality criterion provides a measure of dissimilarity of the two models 
at the design points, a better approach to the evaluation of the designs in terms of model 
discrimination may be to conduct power tests by simulation in a similar manner to the 
methods used for the calculation of power for generalised linear models given in this thesis. 
It would be interesting to examine the relationship between the T-optimality criterion and 
the power found by these simulations. This is an area for future investigation. Further topics 
of research could also include the evaluation of the use of conditional and hybrid designs for 
discrimination between more than two nonlinear models, with the additional goal of efficient 
parameter estimation under each model. Atkinson and Fedorov (1975b) have defined the 
T-optimality criterion for such a situation. The methods of construction of conditional and 
hybrid designs described in this thesis are general enough to apply to alternative criteria 
such as this. 

The product criterion for parameter estimation in nonlinear models was applied to a real- 
life design optimisation problem in pharmacokinetics, in which the aim was to discriminate 
between two candidate models as well as to estimate the parameters efficiently under each 
model. The models involved were rather complex, involving between subject variability and 
multiple responses, and the solution to the differential equations describing the model for 
nonlinear elimination does not exist in closed form. Numerical solutions to these equations 
were able to be incorporated into the design process, as well as the random effects and multi- 
ple responses. The resulting product criterion was shown by simulations to produce a design 
which is efficient in terms of both model discrimination and parameter estimation. This is 
the third published case of the prospective use of optimal design for a pharmacokinetic ex- 
periment (the hrst involving nonlinear elimination), and illustrates the usefulness of optimal 
design as opposed to more traditional empirical methods of design in this area. Some ideas 
investigated in that work are now being adopted by a major international pharmaceutical 
company for the design of their pharmacokinetic studies. 

The experiment itself was constrained by the experimenters in a number of aspects, 
but the design process was also under time constraints. In the limited time available, the 
methods were researched and applied to these models, and the computationally intensive 
optimisation was carried out. Given more time and resources, a more sophisticated method 
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of incorporating multiple responses for mixed effects models could be considered, although 
the simplifications and assumptions made here seem to be fairly inconsequential, as the 
standard errors predicted by the approximated information matrix are acceptably close to 
those estimated from simulated data. The use of numerical derivatives could also be replaced 
by a more elegant method such as the 'direct method' described in Atkinson and Bogacka 
(2002). 

The method of constructing hybrid designs for nonlinear models was adapted to designs 
for generalised linear models, where again the objective of the experiment is to discriminate 
between two models (nested in this case), and to efficiently estimate the parameters under 
each model. Conditional designs were not used here as the differences between conditional 
and hybrid designs appeared to be minimal for nonlinear models, and the construction of 
conditional designs is much more computationally intensive. Similar trends were seen for 
GLMs as for nonlinear models when using hybrid designs, with a reasonable trade-off between 
estimation and discrimination. In addition to the comparison of T- and .D-optimality criteria, 
these hybrid designs were evaluated by power tests involving large simulations. While the 
difference in power between the product optimal and T-optimal designs is often minimal, a 
distinct trend of increasing power is seen as the hybrid designs place more weight on the T- 
optimal design points, and vice versa. This is encouraging, as it shows that the T-optimality 
criterion does actually produce designs which are more effective than others in terms of 
model discrimination. 

While it has been seen that the T-optimality criterion value of a design increases with its 
power to correctly discriminate between two models in these cases, the actual relationship 
between the two is unknown. Given the information matrix for a design, the standard errors 
of parameter estimates can be predicted by the diagonal elements of the inverse of the matrix. 
It would be beneficial to approximate the power of a design (very computationally expensive 
to compute using simulations) in a manner analogous to this, based on the design's T- 
optimality criterion. As for nonlinear models, the construction of hybrid designs for GLMs 
described in this thesis can be generalised to apply to three or more models, using the 
T-optimality criterion for these situations as described by Ponce de Leon and Atkinson 
(1992), and the product of D-optimality criteria for all models for the product design. These 
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techniques are also easily applied to models with more than one or two explanatory variables, 
not generally addressed in research in optimal design for GLMs. The T^-optimality criterion 
outlined in this thesis has more attractive distributional qualities than the T-optimality 
criterion, although its optimisation is at present a computationally intractable problem. 
Further work on this or a related criterion may lead to methods of producing designs more 
effective for model discrimination. 

To complement the real-life example of optimal design for a nonlinear pharmacokinetic 
model, the optimal design of a hypothetical pharmacodynamic model involving a logistic 
regression model was considered. Optimisation of crossover and parallel (single period) de- 
signs were shown to offer a significant improvement over more traditionally used balanced 
design with an increase in parsimony due to the significant decrease in the number of sup- 
port points. Although the designs found are locally optimal, sensitivity analyses show that 
these designs are still quite efficient if the parameter values are misspecified to a reasonably 
large degree. Designs which are composites of optimal crossover and parallel designs afford 
experimenters greater flexibility in such trials without adversely affecting the efficiency of 
parameter estimates. 

It would be interesting to extend these methods to account for crossover designs with more 
than two periods. Although a carryover effect of greater than one period is not generally 
considered in these models in most situations, second-order (or higher) carryover effects 
into the models may be incorporated into the design process. Optimal designs under these 
conditions may or may not have the same properties as those presented in this thesis. 

Generalised linear mixed models have been neglected in the optimal design literature, 
and the work presented in this thesis highlights the enormous technical and computational 
difficulties associated with design optimisation for such models. Two approximations to 
the information matrix for a logistic regression model were given, with one, M\y, involving 
less simplification than the other, Ml, but both relying on a Taylor expansion of the log- 
likelihood function. Using a computationally intensive numerical method, M^, to compare 
designs found using the two approximate methods, the optimal designs based on Mw, the 
matrix using less simplification (and much greater algebraic complexity), was found to be 
slightly less efficient than designs produced by Ml- Designs optimised using the much 
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simpler information matrix which ignored the random effects altogether were actually found 
in some cases to be more efficient than the two methods which incorporated the random 
effects. 

The reasons for the unexpected differences in efficiency between these methods remain 
to be understood, and may be investigated further. A more profitable approach, however, 
may be to focus instead on alternative models such as the beta-binomial model or the same 
model used here with the logit link function replaced by the probit link function, which 
has a similar response curve. Early exploration has shown that these models may be more 
algebraically manageable. Once issues with the information matrix have been adequately 
dealt with, the crossover designs could be expanded as described above to include more 
periods and higher-order carryover effects. 

Another aim of the design of an experiment not mentioned in this thesis, which may be 
addressed by optimal design, is that of reducing the bias of parameter estimates in nonlinear 
and generalised linear models. This is a significant area of future research. Optimal design 
for mixed effects models in general also require further investigation. The hxed and random 
effects may be treated separately in a type of compound criterion, or in the case where 
random effects are treated as nuisance parameters, we may adjust for them through the use 
of Z? s -optimal designs. Bayesian methods for experimental design, such as those outlined 
in Chaloner and Verdinelli (1995) and Clyde (2001), are also being considered for many 
problems discussed in this thesis. 
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Appendix A 



Appendix: Source code 



This appendix contains MATLAB code (M-files) used to find the optimal designs given in 
this thesis. The search algorithms described in Section 2.4, simulated annealing and the 
cross-entropy method, are given first, followed by some miscellaneous additional functions 
used by these algorithms. 

A.l anneal. m: Simulated annealing algorithm 

function [X,FVAL,elapsedtime,EXITFLAG, OUTPUT, LAMBDA, GRAD, HESSIAN] = ... 

anneal (FUN , X , LB , UB , options , varargin) 
°/ ANNEAL Finds the constrained maximum of a function of several variables 
% by simulated annealing. 
7. 

% ANNEAL solves problems of the form: 
'/, max F(X) subject to: LB <= X <= UB 

•/. 

•/, X= ANNEAL ( FUN, XO, LB, UB) starts at XO and finds a maximum X to the 

% function FUN, subject to the lower and upper bounds LB and UB. FUN 

% accepts input X and returns a scalar function value F evaluated at X. 

% XO may be a scalar, vector, or matrix. 

7. 

7, X=ANNEAL (FUN, XO, LB, UB, OPTIONS) maximises with the default 

7o optimisation parameters replaced by values in the structure OPTIONS. 

°L Use OPTIONS = [] as a place holder if no options are set. 

7o The list of options available: 

7. 

°L initprob initial probability of accepting downhill steps 

7, (default = 0.95) 
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Appendix: Source code 



cool 

ncyct 

ncycs 

ncyci 

vmin 

maxit 

dispint 

Vstep 

weighted 



cooling rate: temp = cool*temp 

(default =0.9) 

number of cycles between changes in temperature 

(default = 20) 

number of cycles between changes in step size 

(default = 10) 

number of initial cycles (to calculate temperature) 

(default = 100) 

minimum change in step size 

(default = le-3) 

maximum number of iterations 

(default = le7) 

number of temperature changes between displaying 

interim results (default = 5) 

initial step size 

(default = UB-LB) 

weighted = 1 : constrain last column of X to sum to 1 

(default = 1) 



X=ANNEAL (FUN, X0, LB ,UB, OPTIONS, PI, P2, .. .) passes the 
problem-dependent parameters P1,P2,... directly FUN: 

feval(FUN,X,Pl,P2, . . .) . 
Pass empty matrices for OPTIONS, LB, and UB to use the 
default values. 

[X,FVAL]=ANNEAL(FUN,X0, . . .) returns the value of the objective 
function FUN at the solution X. 

Examples 

FUN can be specified using @: 

X = anneal (Ohumps, . . .) 
In this case, F = humps (X) returns the scalar function value F of 
the HUMPS function evaluated at X. 

FUN can also be an inline object: 

X = anneal(inline('3*sin(x(l))+exp(x(2))'), [1 ; 1] , [0;0] , [1;1]) 
returns X = [0 ; 0] . 

Calling with options: 

options = struct( 'vmin' , le-5, 'ncyct ' ,20, 'ncycs' ,60, .. . 

'weighted' ,1, 'dispint' , 1) ; 
X = anneal (Of un,x0, lb, ub, options) 
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% Copyright 2005 Tim Waterhouse, all rights reserved 

if nargin < 4, error ( 'ANNEAL requires at least four input arguments'); end 
if nargin < 5 

options = struct ( ' initprob ' , . 95 , ' cool ',0.9,' ncyct ' , 10 , ' ncycs ' , 20 , . . . 
'ncyci' ,100, 'vmin' ,le-3, 'maxit' ,le7, 'dispint' ,5, 'Vstep' ,UB-LB, 'weighted' ,1) ; 
else 

% initial probability of accepting downhill steps 
if ~isfield(options, 'initprob') 

options . initprob = 0.95; 
end 

% cooling rate: temp = cool*temp 
if ~isf ield(options, ' cool') 

options. cool =0.9; 
end 

'/, number of cycles between changes in temperature 
if ~isfield(options, 'ncyct') 

options .ncyct = 10; 
end 

% number of cycles between changes in step size 
if ~isfield(options, 'ncycs') 

options .ncycs = 20; 
end 

% number of initial cycles (to calculate temperature) 
if ~isfield(options, 'ncyci') 

options .ncyci = 100; 
end 

% minimum change in step size (stop when this value is reached) 
if ~isf ield(options, 'vmin') 

options. vmin = le-3; 
end 

% maximum number of iterations 
if ~isfield(options, 'maxit') 

options .maxit = le7; 
end 

% number of temperature changes between displaying interim results 
if ~isfield(options, 'dispint ') 

options . dispint = 5; 
end 

% initial step size 
if ~isfield(options, 'Vstep' ) 

options .Vstep = UB-LB; 
end 
7, weighted = 1 : constrain last column of X to sum to 1 
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if ~isfield(options, 'weighted' ) 

options .weighted = 1; 
end 
end 

[m,n]=size(X) ; 

Vstep = options .Vstep; 

b = -inf; 

start = clock; 

cr=0 ; % counter for number of iterations 

crt=0; % counter for number of tempe changes 

RJCT=zeros(m,n) ; '/, vector of number of rejects per ncycs iterations 

if options .weighted 

X(:,n) = X(:,n)./sum(X(:,n)); 
end 

bestX = X; 

d = feval(FUN,X,varargin{:}) ; 
opt = d; 
best = opt ; 
cnt = ; 

umax = 0; omin = inf; omax = best; 
for i = 1 : options .ncyci 
cnt = cnt+1 ; 
for j=l:m 

for k = 1 :n 

tmpX = -ones(m,n) *Inf ; 

tmpX = X; 

tmpX(j,k) = min(max(tmpX(j ,k) + (rand*2-l) *Vstep( j ,k) , ... 

LB(j,k)),UB(j,k)); 
if options .weighted 
if k == n 

tmpX(:,n) = tmpX( : ,n) . /sum(tmpX( : ,n) ) ; 
end 
end 

dl = feval(FUN,tmpX,varargin{:}) ; 
if dl == -Inf 

dl = d; 
end 
if abs(d - dl) > umax 

umax = abs(d - dl) ; 
end 
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if dl > omax 
omax = dl ; 
bestX = tmpX; 
end 
if dl < omin 

omin = dl ; 
end 
end 
end 
if rem(cnt , options .dispint)==0 

disp([cnt omax]) 
end 
end 

[umax, omax, omin, omax-omin] 

best = omax; 

opt = omax ; 

if umax == I umax == Inf 

temp = le3 
else 

temp = -umax/log(options . initprob) 
end 

cnt = ; 
bestX 

save interim. mat 

while max(max(Vstep. /(UB-LB) )) > options. vmin & cr < options .maxit 
% restart with current best 
X = bestX; 
for cyct = 1 : options .ncyct 

for eyes = 1 : options .ncyes 
for j=l:m 

for k = 1 :n 

cr = cr + 1 ; 

tmpX = X; 

if size(X) ~= [m,n] 

X 
end 
tmpX(j.k) = X(j,k) + (rand*2-l)*Vstep(j ,k) ; 

if tmpX(j.k) < LB(j,k) I tmpX(j,k) > UB(j,k) 

d = -Inf; 
else 
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if options .weighted 
if k == n 

tmpX(:,n) = tmpX( : ,n) . /sum(tmpX( : ,n) ) ; 
end 
end 

d = f eval(FUN,tmpX,varargin{ : }) ; 
end 
if d > opt 

X = tmpX; 
opt = d; 
else 

if exp( (d-opt) /temp) >rand 
X = tmpX; 
opt = d; 
else 

RJCT(j,k) = RJCT(j,k) + 1; 
end 
end 

if opt > best 
bestX = X; 
best = opt ; 
end 
end 
end 
end 

FRJCT = max(RJCT/options.ncycs,0.01) ; 
c_sa = min(temp,0 .01) ; 
Vstep=Vstep . / (FRJCT/ . 5) +c_sa ; 
RJCT = zeros(m,n); 
end 

if options .weighted 

[I, J]=ind2sub(size(bestX( : ,n)) , . . . 

f ind(bestX( : ,n)<options. vmin & Vstep( : ,n) <options . vmin) ) ; 
if length(I) > 
tmpX = bestX; 
tmpX(I,:) = [] ; 

tmpX(:,n) = tmpX( : ,n) . /sum(tmpX( : ,n) ) ; 
d = f eval(FUN,tmpX,varargin{: }) ; 
if d >= best 

bestX = tmpX; 
best = d; 
Vstep(I,:) = [] ; 
RJCTCI, :) = []; 
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LB(I,:) = [] ; 
UB(I,:) = [] ; 
m = m-length(I) ; 
end 
end 
end 

temp = temp*options . cool; 

cnt = cnt+1 ; 

if rem(cnt , options .dispint)==0 

disp([temp mean (mean (FRJCT) ) max(max(Vstep . /(UB-LB) ) )] ) 

disp(bestX) 

best 
end 

save interim. mat 
if options .weighted & m==l 

break 
end 
end 

if cr >= options .maxit 

disp( 'Maximum number of iterations reached') 
else 

disp( 'Cooling finished.') 
end 

disp( 'Elapsed time:') 
elapsedtime = etime(clock, start) 
disp( 'Number of temperature changes:') 
disp(cnt) 

if options .weighted 

X = sortrows(collapse(bestX, options .vmin)) 
else 

X = sortrows(bestX) 
end 
FVAL = best 

save interim. mat 



A. 2 crossentropy.m: The cross-entropy method 

function [bestx,bestcrit , elapsedtime] = ... 

crossentropy (FUN, LB, UB, options, varargin) 
7 CR0SSENTR0PY Finds the constrained maximum of a function of several 
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'I, variables by the cross-entropy method. 

'/. 

% CROSSENTROPY solves problems of the form: 

'/, max F(X) subject to: LB <= X <= UB 

'/. 

°L FUN = F(X), function containing the criterion to be maximised 

'/, LB = Lower bound on the design matrix 

°/, UB = Upper bound on the design matrix 

'/. 

% Copyright 2005 Tim Waterhouse, all rights reserved 

start = clock; 

if nargin < 3, error ('CE requires at least three input arguments'); end 

if nargin == 3 

options = struct( 'tol' , le-2, 'rho' ,0 . 1 , . . . 

'alphal' ,0.9, 'alpha2' ,0.3, 'N' , 1000, 'weighted' , 1 , 'dispirit' , 10, . . . 

' num_inj ect ' , 3 , ' h_inj ect ' , 2) ; 
else 

if ~isf ield(options, 'tol') 

options. tol = le-2; 
end 
if ~isf ield(options, 'rho') 

options. rho =0.1; 
end 
if ~isfield(options, 'alphal ') 

options . alphal = 0.9; 
end 
if ~isf ield(options, 'alpha2') 

options .alpha2 = 0.3; 
end 
if ~isf ield(options, 'N' ) 

options. N = 1000; 
end 
if ~isfield(options, 'weighted' ) 

options .weighted = 1; 
end 
if ~isf ield(options, 'dispint ' ) 

options .dispint = 10; 
end 
if "isfi eld (opt ions, 'num_inject ' ) 

options .num_inj ect = 3; 
end 
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if "isf ield(options, 'h_inject') 

options .h_inject = 2; 
end 
end 

[r, c] = size(LB) ; 

LB = reshape(LB,r*c, 1) ; 

UB = reshape(UB,r*c,l) ; 

% initial estimates of mu and sigma 
mu = LB + (UB-LB)/2; 
sigma = (UB-LB)/4; 

bestx = mu; 

bestcrit = feval (FUN, reshape (mu,r, c) ,varargin{ :}) ; 

currbest = bestcrit; 

inject = 0; 

t=0; 

currbestx=[] ; 

while inject <= options. num_inject 
t=t+l; 
x=[]; 

critvec= [] ; 
for s=l : options .N 

tmp = normt_rnd(mu, sigma. "2 ,LB,UB) ; 

if options .weighted 

k=sum(tmp(r*(c-l)+l :r*c)) ; 
tmp(r*(c-l)+l :r*c)=tmp(r*(c-l)+l:r*c)/k; 

end 

critvec = [critvec, feval (FUN, reshape (tmp, r,c) ,varargin{: })] ; 

x = [x, tmp] ; 
end 

[critsort , ind] =sort (critvec) ; 

ind = ind(ceil(options.N*(l-options .rho)) : options. N) ; 
currbestx=[] ; 
for i = ind 

currbestx = [currbestx, x(:,i)]; 
end 

mu = options . alphal*mean(currbestx, 2) + (1-options . alphal) *mu; 
sigma = options. alpha2*std( currbestx, 1 ,2) + (1-options . alpha2) *sigma; 
q = max (sigma) ; 
prevbest = currbest; 
currbest = max (critvec) ; 
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if q <= options. tol 

inject = inject + 1; 

sigma = sigma + abs(currbest-prevbest)*options .h_inject; 
q = max (sigma) ; 
end 
if currbest > bestcrit 

bestx = x( : , ind(options .rho*options .N)) ; 
bestcrit = currbest; 
end 

if rem(t , options . dispint)==0 
reshape(bestx,r , c) 
[bestcrit ,q] 
end 

bestx = reshape (bestx, r, c) ; 
mu = reshape (bestx, r , c) ; 
sigma = reshape(sigma,r , c) ; 
LB = reshape (LB, r, c) ; 
UB = reshape (UB, r, c) ; 
if options .weighted 

[I, J] =ind2sub (size (bestx ( : , c)) , . . . 

f ind(bestx( : , c) <options . tol & sigma( : , c) <options . tol)) ; 
if length(I) > 
tmpx = bestx; 
tmpx(I, :) = [] ; 

tmpx(:,c) = tmpx( : , c) . /sum(tmpx( : , c) ) ; 
d = feval (FUN, tmpx, varargin{: }) ; 
if d >= bestcrit 
bestx = tmpx; 
bestcrit = d; 
mu(I,:) = [] ; 
sigma(I, :) = [] ; 
LB(I,:) = [] ; 
UB(I,:) = [] ; 
r = r-length(I) ; 
end 
end 
end 

bestx = reshape(bestx,r*c, 1) ; 
mu = reshape(bestx,r*c, 1) ; 
sigma = reshape (sigma, r*c, 1) ; 
LB = reshape (LB, r*c, 1) ; 
UB = reshape (UB,r*c,l) ; 

end 
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if options .weighted 

bestx=sortrows(collapse(reshape(bestx,r,c) , options. tol)) ; 
else 

bestx=sortrows (reshape (bestx , r , c) ) ; 
end 
elapsedtime = etime(clock, start) ; 

A. 3 Miscellany 

A. 3.1 collapse, m 

The following function 'collapses' an approximate design to its simplest form. That is, it 

combines equal (or close to equal) support points into a single support point with a weight 

equal to the sum of the weights of its constituent points. It also removes any points with 

zero weight (or close to zero weight). 

function Xnew = collapse(X,tol) 

% Collapses an approximate design to its simplest form 

[n,m] = size(X) ; 

% Combine identical (or nearly identical) support points 
for guava = l:(n-l) 

for cumquat = (guava+l):n 

if sum ( abs((X (guava, 1 : (m-1) ) -X( cumquat , 1 : (m-1))))) < tol 
X(guava,m) = X(guava,m) + X ( cumquat , m) ; 
X( cumquat, m) = 0; 
end 
end 
end 

% Remove support points with nearly zero weight 
[I, J] = find(X(: ,m)<tol); 
X(I,:) = []; 

Xnew = X; 

A. 3. 2 normt_rnd.m 

The normt_rnd function draws a random sample from a truncated normal distribution. This 
M-hle is not original work, and was downloaded from the following location: 
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http : //www. spatial-econometrics . com/distrib/normlt_rnd.m 

function result = normt_rnd(mu,sigma2,left , right) 

°L PURPOSE: random draws from a normal truncated to (left, right) interval 

y, 

'I, USAGE: y = normt_rnd(mu,sigma2, left , right) 

'It where: mu = mean (nobs x 1) 

'L sigma2 = variance (nobs x 1) 

'L left = left truncation points (nobs x 1) 

'It right = right truncation points (nobs x 1) 

y, 

'/. RETURNS: y = (nobs x 1) vector 

I 

°L NOTES: use y = normt_rnd(mu, sigma2, left ,mu+5*sigma2) 

'It to produce a left-truncated draw 

°/t use y = normt_rnd(mu, sigma2,mu-5*sigma2, right) 

°/t to produce a right-truncated draw 

% 

'/, SEE ALSO: normlt_rnd (left truncated draws), normrt_rnd (right truncated) 
'/. 

'/, adopted from Bayes Toolbox by 

'/, James P. LeSage, Dept of Economics 

'It University of Toledo 

I 2801 W. Bancroft St, 

7. Toledo, OH 43606 

°/t jpl@jpl.econ.utoledo.edu 

'I, For information on the Bayes Toolbox see: 

'It Ordinal Data Modeling by Valen Johnson and James Albert 

'/. Springer-Verlag, New York, 1999. 



if nargin ~= 4 

error ( 'normt_rnd: wrong # of input arguments'); 

end; 

std = sqrt (sigma2) ; 

'It Calculate bounds on probabilities 

lowerProb = Phi ( (left-mu) . /std) ; 

upperProb = Phi ( (right -mu) . /std) ; 
°/t Draw uniform from within (lowerProb, upperProb) 

u = lowerProb+(upperProb-lowerProb) . *rand(size(mu) ) ; 
'It Find needed quantiles 
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result = mu + Phiinv(u) . *std; 

function val=Phiinv(x) 

'/, Computes the standard normal quantile function of the vector x, 0<x<l 

val=sqrt(2)*erfinv(2*x-l) ; 

function y = Phi(x) 

7o Phi computes the standard normal distribution function value at x 

y= .5*(l+erf (x/sqrt(2))) ; 
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